10. How does git really work?#

Today we will dig into how git really works. This will be a deep dive and provide a lot of details about git. It will conceptually reinforce important concepts and practically give you some ideas about how you might fix things when things go wrong.

Next week, we will built on this more on the practical side, but these concepts are very important for making sense of the more practical aspects of fixing things in git.

This deep dive in git is to help you build a correct, flexbile understanding of git so that you can use it independently and efficiently. The plumbing commands do not need to be a part of your daily use of git, but they are the way that we can dig in and see what actually happens when git creates a commit.

cd systems/ github-in-class-brownsarahm-1/

Important

git stores important content in files that it uses like variables.

For example:

cat .git/HEAD 
ref: refs/heads/main

holds the current branch and

cat .git/config 
[core]
	repositoryformatversion = 0
	filemode = true
	bare = false
	logallrefupdates = true
	ignorecase = true
	precomposeunicode = true
[remote "origin"]
	url = https://github.com/introcompsys/github-in-class-brownsarahm-1.git
	fetch = +refs/heads/*:refs/remotes/origin/*
[branch "main"]
	remote = origin
	merge = refs/heads/main
[branch "2-create-an-about-file"]
	remote = origin
	merge = refs/heads/2-create-an-about-file
[branch "fill-in-about"]
	remote = origin
	merge = refs/heads/fill-in-about
[branch "organization"]
	remote = origin

stores information about the different branches and remotes.

There are many:

ls .git
COMMIT_EDITMSG	description	info		refs
HEAD		hooks		logs
config		index		objects

Important

.gitignore is a file in th working direcotry that contains alist of files and patterns to not track.

We can see it is a hidden file in the working directory with ls -a

cd ../tiny-book/
ls -a
.			_config.yml		markdown.md
..			_toc.yml		notebooks.ipynb
.git			intro.md		references.bib
.gitignore		logo.png		requirements.txt
_build			markdown-notebooks.md

and what it contains:

cat .gitignore 
_build

10.1. Creating a repo from scratch#


We will start in the top level course directory.

cd ..
ls

Mine looks like this:

example				tiny-book
github-in-class-brownsarahm-1

Yours should also have your kwl repo, group repo, etc.

We can create an empty repo from scratch using git init <path>

Last time we used an existing directory like git init .

Today we will create a new directory:

git init test

we get this message again, see context from last week

hint: Using 'master' as the name for the initial branch. This default branch name
hint: is subject to change. To configure the initial branch name to use in all
hint: of your new repositories, which will suppress this warning, call:
hint: 
hint: 	git config --global init.defaultBranch <name>
hint: 
hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
hint: 'development'. The just-created branch can be renamed via this command:
hint: 
hint: 	git branch -m <name>
Initialized empty Git repository in /Users/brownsarahm/Documents/inclass/systems/test/.git/

It creates a folder and gives us a warning about branch names. If you have a new install you will not see this, because new versions of git have this by default.

We change into the new directory

cd test/

and then rename the branch

git branch -m main

To clarify we will look at the status

git status
On branch main

No commits yet

nothing to commit (create/copy files and use "git add" to track)

Noticee are no commits, and no origin.

ls .git
HEAD		description	info		refs
config		hooks		objects

we’ve looked at most of these, but we have not been able to see the objects before. We will work with those now.

10.2. Searching the file system#

We can use the bash command find to search the file system note that this does not search the contents of the files, just the names.

find .git/objects/
.git/objects/
.git/objects//pack
.git/objects//info

we have a few items in that directory and the directory itself.

We can limit by type, to only files with the -type option set to f

find .git/objects/ --type f
find: --type: unknown primary or operator

I made a typo, so I removed the extra -

find .git/objects/ -type f

And we have no results. We have no objects yet. Because this is an empty repo

10.3. Git Objects#

There are 3 types:

  • blob objects: the content of your files (data)

  • tree objects: stores file names and groups files together (organization)

  • Commit Objects: stores information about the sha values of the snapshots

classDiagram class tree{ List: - hash: blob - string: type - string:file name } class commit{ hash: parent hash: tree string: message string: author string: time } class blob{ binary: contents } class object{ hash: name } object <|-- blob object <|-- tree object <|-- commit

10.3.1. Hashing objects#

Let’s create our first one. git uses hashes as the key. We give the hashing function some content, it applies the algorithm and returns us the hash as the reference to that object. We can also write to our database with this.

The git hash-object function works on files, but we do not have any files yet. We can create a file, but we do not have to. Remeembr, everything is a file. When we use things like echo it writes to th stdout file.

echo "test content" 
test content

which shows on our trminal. We can us a pipe to connect the stdout of on command to the stdin of the next.

echo "test content" | git hash-object -w --stdin

we get back the hash:

d670460b4b4aece5915caf5c68d12f560a9fe3e4

We can break down this command:

  • git hash-object would take the content you handed to it and merely return the unique key

  • -w option tells the command to also write that object to the database

  • --stdin option tells git hash-object to get the content to be processed from stdin instead of a file

  • the | is called a pipe (what we saw before was a redirect) it pipes a process output into the next command

  • echo would write to stdout, withthe pip it passes that to std in of the git-hash

and we can check if it wrote to the database.

find .git/objects/ -type f
.git/objects//d6/70460b4b4aece5915caf5c68d12f560a9fe3e4

and we see a file that it was supposed to have!

10.3.2. Viewing git objects#

We can try with cat

cat .git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4 
xK??OR04f(I-.QH??+I?+?K?	


This is binary output that we cannot understand. Fortunately, git provides a utility. We can use cat-file to use the object by referencing at least 4 characters that are unique from the full hash, not the file name. (70460 will not work)

git cat-file -p d670

cat-file requires an option -p is for pretty print

test content

This is the content that we put in, as expected.

10.3.3. Hashing a file#

let’s create a file

echo "version 1" > test.txt

and store it in our database, by hashing it

git hash-object -w test.txt 
83baae61804e65cc73a7201a7252750c76066a30

we can look at what we have.

find .git/objects/ -type f
.git/objects//d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects//83/baae61804e65cc73a7201a7252750c76066a30

Now this is the status of our repo.

classDiagram class d67046{ test content (blob) } class 83baae{ Version 1 (blob) }

We can check the typ of fils with -t and git cat-file

git cat-file -t 83ba
blob
git cat-file -t d670
blob

Notice, however, that we only have one file in the working directory.

ls
test.txt

Note

the workign directory and the git repo are not strictly the same thing, and can be different like this. Mostly they will stay in closer relationship that we currently have unless we use plumbling commands, but it is good to build a solid understanding of how the .git directory relates to your working directory.

So far, even though we have hashed the object, git still thinks the file is untracked, because it is not in the tree and there are no commits that point to that part of the tree.

git status
On branch main

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	test.txt

nothing added to commit but untracked files present (use "git add" to track)

We can write a tree

git write-tree
4b825dc642cb6eb9a060e54bf8d69288fbee4904

and look at the tree

git cat-file -p 4b82

but it is empty

git cat-file -t 4b82
tree

becuse write-tree works from the index.

find .git/objects/ -type f
.git/objects//d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects//4b/825dc642cb6eb9a060e54bf8d69288fbee4904
.git/objects//83/baae61804e65cc73a7201a7252750c76066a30

it still made an object tough, because we told it to.

classDiagram class d67046{ test content (blob) } class 83baae{ Version 1 (blob) } class 4b825d{ (tree) }

10.4. Updating the Index#

Now, we can add our file as it is to the index.

git update-index --add --cacheinfo 100644 \
 83baae61804e65cc73a7201a7252750c76066a30 test.txt

the \ lets us wrap onto a second line.

  • this the plumbing command git update-index updates (or in this case creates an index, the staging area of our repository)

  • the --add option is because the file doesn’t yet exist in our staging area (we don’t even have a staging area set up yet)

  • --cacheinfo because the file we’re adding isn’t in your directory but is in the database.

  • in this case, we’re specifying a mode of 100644, which means it’s a normal file.

  • then the hash object we want to add to the index (the content) in our case, we want the hash of the first version of the file, not the most recent one.

  • finally the file name of that content

git status
On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   test.txt

Now the file is staged.

Let’s edit it further.

echo "version 2" >> test.txt 

So the file has two lines

cat test.txt 
version 1
version 2

Now check status again.

git status
On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   test.txt

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   test.txt

We added the first version of the file to the staging area, so that version is ready to commit but we have changed the version in our working directory relative to the version from the hash object that we put in the staging area so we also have changes not staged.

We can hash and store this version too.

git hash-object -w test.txt 
0c1e7391ca4e59584f8b773ecdbbb9467eba1547
find .git/objects/ -type f
.git/objects//0c/1e7391ca4e59584f8b773ecdbbb9467eba1547
.git/objects//d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects//4b/825dc642cb6eb9a060e54bf8d69288fbee4904
.git/objects//83/baae61804e65cc73a7201a7252750c76066a30
classDiagram class d67046{ test content (blob) } class 83baae{ Version 1 (blob) } class 4b825d{ (tree) } class 0c1e73{ Version 1 Verson 2 (blob) }
git status
On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   test.txt

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   test.txt

ls
test.txt
cat test.txt 
version 1
version 2
git status
On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   test.txt

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   test.txt

Now we can write a tre again and it will have content.

git write-tree
d8329fc1cc938780ffdd9f94e0d364e0ea74f579

Lets examine the tree, first check the type

git cat-file -t d8329
tree

and now we can look at its contents

git cat-file -p d8329
100644 blob 83baae61804e65cc73a7201a7252750c76066a30	test.txt

Now this is the status of our repo:

classDiagram class d67046{ test content (blob) } class 83baae{ Version 1 (blob) } class 4b825d{ (tree) } class d8329f{ blob: 83baae filename: test.txt (tree) } class 0c1e73{ Version 1 Verson 2 (blob) } d8329f --|> 83baae

This only keeps track of th objects, there are also still the HEAD that we have not dealt with and the index.

10.5. Creating a commit manually#

We can echo a commit message through a pipe into the commit-tree plumbing function to commit a particular hashed object.

echo "first commit" | git commit-tree d8329
188a75ef66b6a85be0ab68d8575ec27808881dfc

and we get back a hash. But notice that this hash is unique for each of us. Because the commit has information about the time stamp and our user. The above hash is the one I got during class, but when I re-ran this while typing the notes I got a different hash (d450567fec96cbd8dd514313db9bcb96ad7664b0) even though I have the same name and e-mail because the time changed.

We can also look at its type

git cat-file -t 188a
commit

and we can look at the content

git cat-file -p 188a
tree d8329fc1cc938780ffdd9f94e0d364e0ea74f579
author Sarah M Brown <brownsarahm@uri.edu> 1677177139 -0500
committer Sarah M Brown <brownsarahm@uri.edu> 1677177139 -0500

first commit

Now we check the final status of our repo

find .git/objects/ -type f
.git/objects//0c/1e7391ca4e59584f8b773ecdbbb9467eba1547
.git/objects//d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects//d8/329fc1cc938780ffdd9f94e0d364e0ea74f579
.git/objects//18/8a75ef66b6a85be0ab68d8575ec27808881dfc
.git/objects//4b/825dc642cb6eb9a060e54bf8d69288fbee4904
.git/objects//83/baae61804e65cc73a7201a7252750c76066a30

Important

Check that you also hav 6 objects and 5 of them should match mine, the one you should not have is the 188a75e one but you should have a different one.

classDiagram class d67046{ test content (blob) } class 83baae{ Version 1 (blob) } class 4b825d{ (tree) } class d8329f{ blob: 83baae filename: test.txt (tree) } class 0c1e73{ Version 1 Verson 2 (blob) } class 188a75{ tree d8329f author name commiter time } d8329f --|> 83baae 188a75 --|> d8329f

10.5.1. There are many git objects#

Remember how many objects thre weere in the github inclass repo

cd ../github-in-class-brownsarahm-1/
find .git/objects/ -type f
.git/objects//0d/d61172a7dad0ac78ed6ca1788130a607983973
.git/objects//66/63ed3256515f3b45209c380c76bc2c47225175
.git/objects//57/de0cd87ec14926273fd4760b94c0ea6f5b87d3
.git/objects//03/29e78f9f8cc84f80808b907b044295f1f82594
.git/objects//6a/2e1cc65204d0c05acc477e7d5e704f989a68dc
.git/objects//56/f29cad4c33dc9e5c194e6f774024bca756b6ee
.git/objects//34/6853b8cf891a2783619506221965ffd35ed99d
.git/objects//5f/3e4cc9381a15bf59b928af1381cb5f2518cedb
.git/objects//9d/b284a866b196a79fd633c439b1dce783740137
.git/objects//9c/b0323be29dab2db451e9975fa2e65b37190ff1
.git/objects//d9/328fbe3ec302b590077eee7ad599b7dc246685
.git/objects//bb/4e0d87b6dc7d6c48e5515f6a52cc05948ca9e3
.git/objects//df/d1d047143ec437b431560e2ba8e7eb0c0d514f
.git/objects//a2/fdcfaa06f1c56dac80df0cef908423b425ee4a
.git/objects//bc/dc409a2e646854230114127ab8ad2f336668db
.git/objects//f2/844b2a9c31f83e589a73c1c1b8bd7696c8cf64
.git/objects//e3/cd28cf87c8f6c40529b4687cda50776970c829
.git/objects//cf/fcf05fdee460b85057ae3716ce1f10ac205a22
.git/objects//c1/b4b79b8bdd8e4496ea42bedcf143d22aea7e03
.git/objects//4b/89dffc186f530de628673b199676a097d6d13c
.git/objects//pack/pack-f37aaf1cb9275265234c59965db2dc9400655294.idx
.git/objects//pack/pack-f37aaf1cb9275265234c59965db2dc9400655294.pack
.git/objects//7c/3f99d89aafd49b67e50baf15adae32a3390ca5
.git/objects//45/fcb1dd311e5e45af759cb3627dca5f47f58f04
.git/objects//10/d85a79160a778436092df1706244394ba8376e
.git/objects//81/2245d1177f889dd3ea2e20da6ac104085d25a3
.git/objects//00/96423976e05d13728aa27490d6dd2d2840cc84
.git/objects//98/96f7a7000a7b9d2fdb12047a141524358286c3
.git/objects//5e/373ecaa68320f4fb5cfec60a0d46dd68084693
.git/objects//5b/0dbbf0e8590575ff3fe8e9df38d140ce2e2948
.git/objects//01/69e39a61bde682786d77bcde127fde0ff35d10
.git/objects//6c/96898a9a0a0d5c5f1a38311a919b9b950aa05c
.git/objects//64/0b6939d6dcfb9256e4f070f292f75c32bc8f59
.git/objects//b8/f98b0f8093179f9374bcadd9b6ebe9a5bf274f
.git/objects//ef/45e7786b3fab07e387ed8dda483d536d6fde1f
.git/objects//e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
.git/objects//f7/9afa4312d6903db0c6ad33aaca79a93dadd0c3
.git/objects//e8/2ebb4066f0a6ae070b2f68d13bfbad2f9a6835
.git/objects//fa/eef5ba81b606bf348f1a4e72683b0f8d71b5e4
.git/objects//c5/d4dbaf3808a86af355490f4251ee020d8b7057
.git/objects//c2/8b4ad71b16469c755dba11a1021703a3703212
.git/objects//f8/c5f11ce099748ff751d5535fec38871895a153
.git/objects//70/0ecda4eea6263b886e3ce5faa48ade2079f1d0
.git/objects//23/754587e413b5e75b96665b21b7e9a6404c4026
.git/objects//4f/a9114632f26ec590eec7e91712596085d7c442
.git/objects//12/bbbadd5e636eb345c891e044d4231912085fe4
.git/objects//76/175a40431fa6596e046795f46590208bb04232
.git/objects//47/0d497ce7c645a4646ae7f95999846988546440
.git/objects//78/b2aa7505031c06c109ff8bb2b0a8a1173c0652
.git/objects//8e/2fe11b4ed5ac39b22f680a4c627fd88a6628fd

Back to the test repo

cd ../test/
find .git/objects/ -type f
.git/objects//0c/1e7391ca4e59584f8b773ecdbbb9467eba1547
.git/objects//d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects//d8/329fc1cc938780ffdd9f94e0d364e0ea74f579
.git/objects//18/8a75ef66b6a85be0ab68d8575ec27808881dfc
.git/objects//4b/825dc642cb6eb9a060e54bf8d69288fbee4904
.git/objects//83/baae61804e65cc73a7201a7252750c76066a30

and we have no change in the status

git status
On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   test.txt

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   test.txt

This is because git status works off the HEAD file and we have not updated that or set our branch to point to our commit yet. We will start there on Tuesday.

10.6. Review today’s class#

  1. Review the notes

  2. Make a table in gitplumbingreview.md in your KWL repo that relates the two types of git commands we have seen: plubming and porcelain. The table should have two columns, one for each type of commands. Each row should have one git plumbing command and at least one of the corresponding git porcelain command(s). Include three rows: git add, git commit, and git status.

  3. Contribute to your group repo and review a classmate’s contribution. Include a link to your contribution and review in your badge PR comment using markdown link syntax.

[text to display](url/of/link)

10.7. Prepare for Next Class#

  1. Start recording notes on how you use IDEs for the next couple of weeks using the template file in the course notes (will provide prompts and tips). We will come back to these notes in class later, but it is best to record over a time period instead of trying to remember at that time. Store your notes in your kwl repo in idethoughts.md on an ide_prep branch. This is prep for the week after spring break, it does not have to be in the Feb 28 Experience Badge.

  2. make sure that you have a test git repo that matches the notes. this is very important and if you do not have it you will not be able to follow along in class on Feb 28

# IDE Thoughts

## Actions Accomplished
<!-- list what things you do: run code/ edit code/ create new files/ etc; no need to comment on what the code you write does -->


## Features Used
<!-- list features of it that you use, like a file explorer, debugger, etc -->

10.8. More Practice#

  1. Review the notes

  2. Read about git internals to review what we did in class in greater detail. Make gitplumbingdetail.md. Create a visualization that is compatible with version control (eg can be viewed in plain text and compared line by line, such as table or mermaid graph) that shows the relationship between at least three porcelain commands and their corresponding plumbing commands.

  3. Create gitislike.md and explain main git operations we have seen (add, commit, push) in your own words in a way that will either help you remember or how you would explain it to someone else at a high level. This might be analogies or explanations using other programming concepts or concepts from a hobby.

  4. Contribute to your group repo and review a classmate’s contribution. Include a link to your contribution and review in your badge PR comment using markdown link syntax. (view the raw version of this issue page for the git internals link above for an example)

10.9. Experience Report Evidence#

Important

You need to have a test repo that matches this for class on tuesday.

Generate your evidence with the following in your test repo

find .git/objects/ -type f > testobj.md

then append the contents of your commit object to that file.

Move the testobj.md to your kwl repo in the experiences folder.

10.10. Questions After Today’s Class#

10.10.1. Why is Git so fast? It does lots of reading and writing from files which is a (relatively) slow process but does this all very quickly.#

It is all structured files that are designed to be fast. Most are small and they are all written by git, there is no case handling or user decisions to parse.

10.10.2. What scenarios would we use this over the way we have done them before?#

You would not use this way very often. It is helpful to work through it to build up an understanding of what actually happens. It basically allows us to slow down and do things one step at a time.

10.10.3. How does the tree structure properly become formatted without specifying the branch that you want it to start at or be based on?#

The tree is a part of the snapshot of the files, it looks only at what is in the index. That is why our first one was empty. The tree is independent from branches, we will see how branches relate to this next week.

10.10.4. When files that are not staged get staged, does the file thats already staged change?#

When you add a file to the staging area, it writes it to the .git direcotry, but it does not tell you what the hash of it is. So, it is stored, but you have no way to know how to get it back directly.

Remember, in the case we had today, we staged the already hashed version 1 of the file when our working directory already had only version 2.

In our example, we had a version of test.txt that wasn’t staged and version that was committed. If we staged the second version of test.txt what would hapen to the already staged version. Since the first version is committed, we could still get back to that version. Each one of these is an object in the .git directory so it comes down to a matter of finding it in there.

10.10.5. Are hashes just a number indentifier for an object? Is there a way to change them?#

git primarily uses them as an identifier, but it also uses it to verify that the content is what it should be. There is no way to change them becaus that would break the security.

10.10.6. What is the difference between random and deterministic?#

Random means that it comes from probability, for example rolling a die or flipping a coin. Deterministic dervies from determined, it means there is a fixed outcome.

If you run a random function two times with the same input, it can give different values.

If you run a deterministic two times (or as many as you want) it will always have the same value.

10.10.7. How does the connection work between our devices and GitHub?#

git does not require that at all. SO far in the test repo it is all 100% local. git uses regular web protocols HTTPS and SSH to send content to servers.

10.10.8. How is this useful on the day to day?#

Understanding th different types of objects in git means that you will understand what git does and know how to accomplish different goals.

For example, once on a collaborative project, I realized that a collaborator had deleted content that we actually needed and added content we needed to keep in the same commit. Then we had made several more commits that impacted both files. Because I understood how git actually works, I was able to use the staging area and git revert carefully in order to undo the delete part of the commit and keep the adding part of that commit.

10.10.9. How common is it to use git plumbing commands?#

First note that this is still not manually editing the .git directory. We are still relying on git, just smaller bits at a time. It is a thing to do in a pinch, when things are stuck or you get in a tricky thing you need to undo.

10.10.10. When does hashing occur, after any change is made to a file? Or when a git command is ran?#

Only when a git command is run.

10.10.11. What happens if two hashes are the same?#

this is called a collision. We will talk about it next week. git is set up to prevent this.

10.10.12. Is a tree a snapshot of the repo at the time?#

it is a snapshot of everything that is staged. it is the whole files, not only the changes. git status shows us the fils changes, but the tree also points to the corresponding version of the other files.

10.10.13. if hashes generated by git are deterministic, why is it impossible to convert a hash back into the commit’s contents?#

It is an assymetric algorithm. We will talk more about hashes next week to see more about what this means.

10.10.14. How to access remote branches located in/at origin?#

You can use git branch -r to viw remote branches.

10.10.15. Is using the write tree command the same as creating a new branch?#

No, we have not done anything with branches so far. write-tree is one of the things that happens when we do git add