How does git create a commit?

10. How does git create a commit?#

Today we will dig into how git really works. This will be a deep dive and provide a lot of details about how git creates a commit. It will conceptually reinforce important concepts and practically give you some ideas about how you might fix things when things go wrong.

Later, we will build on this more on the practical side, but these concepts are very important for making sense of the more practical aspects of fixing things in git.

This deep dive in git is to help you build a correct, flexbile understanding of git so that you can use it independently and efficiently. The plumbing commands do not need to be a part of your daily use of git, but they are the way that we can dig in and see what actually happens when git creates a commit.

Inspecting a system’s components is a really good way to understand it and correctly understanding it will impact your ability to ask good questions and even look up the right thing to do when you need to fix things.

Also, looking at the parts of git is a good way to reinforce specific design patterns that are common in CS in a practical way. This means that today we will also:

get more practice with bash
see a practical example of hashing
reinforce through examples what a pointer does

10.1. Review to set the stage#

Let’s look at the github inclass repo.

cd Documents/inclass/systems/github-inclass-fa23-brownsarahm/

Recall: git stores important content in files that it uses like variables.

All of gits files, that is uses to perform the version control system actions are in the .git directory. Since this starts with a . it is “hidden” so to see it we use the -a option on ls.

ls -a

.			LICENSE.md		helper_functions.py
..			README.md		important_classes.py
.git			about.md		setup.py
.github			abstract_base_class.py	tests
API.md			alternative_classes.py
CONTRIBUTING.md		docs

Inside there we can see the files:

ls .git

COMMIT_EDITMSG	REBASE_HEAD	index		packed-refs
FETCH_HEAD	config		info		refs
HEAD		description	logs
ORIG_HEAD	hooks		objects

For example:

cat .git/HEAD 

ref: refs/heads/main

holds the current branch and points to another file that holds the commit value

cat .git/refs/heads/main 

042a42eb47c33ee43d793feb4d891a93e7460527

we also see the config file

cat .git/config 

[core]
	repositoryformatversion = 0
	filemode = true
	bare = false
	logallrefupdates = true
	ignorecase = true
	precomposeunicode = true
[remote "origin"]
	url = https://github.com/introcompsys/github-inclass-fa23-brownsarahm.git
	fetch = +refs/heads/*:refs/remotes/origin/*
[branch "main"]
	remote = origin
	merge = refs/heads/main
[branch "1-create-an-about-file"]
	remote = origin
	merge = refs/heads/1-create-an-about-file
[branch "fun_fact"]
	remote = origin
	merge = refs/heads/fun_fact

that stores information about the different branches and remotes.

ls -a

.			LICENSE.md		helper_functions.py
..			README.md		important_classes.py
.git			about.md		setup.py
.github			abstract_base_class.py	tests
API.md			alternative_classes.py
CONTRIBUTING.md		docs

.gitignore is a file in the working direcotry that contains alist of files and patterns to not track.

cd ../tiny-book/
ls -a

.			_config.yml		markdown.md
..			_toc.yml		notebooks.ipynb
.git			intro.md		references.bib
.gitignore		logo.png		requirements.txt
_build			markdown-notebooks.md

cat .gitignore 

_build/

Remmeber the .gitignore file lives outside of the .git directory because it is intended to be manually edited by the user, instead of written to by the git program.

10.2. Creating a repo from scratch#

We will start in the top level course directory.

cd ..

ls

fa23-kwl-brownsarahm		tiny-book
github-inclass-fa23-brownsarahm

Yours should also have your group repo, your messy folder, etc.

We can create an empty repo from scratch using git init <path>

Last time we used an existing directory like git init .

We will make a new directory for our repo:

git init test

hint: Using 'master' as the name for the initial branch. This default branch name
hint: is subject to change. To configure the initial branch name to use in all
hint: of your new repositories, which will suppress this warning, call:
hint: 
hint: 	git config --global init.defaultBranch <name>
hint: 
hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
hint: 'development'. The just-created branch can be renamed via this command:
hint: 
hint: 	git branch -m <name>
Initialized empty Git repository in /Users/brownsarahm/Documents/inclass/systems/test/.git/

We can see what happened:

ls

fa23-kwl-brownsarahm		test
github-inclass-fa23-brownsarahm	tiny-book

we see the new directory

then we can enter it

cd test/

and then rename the branch

git branch -m main

As usual, we will look at the status

git status

On branch main

No commits yet

nothing to commit (create/copy files and use "git add" to track)

Notice that there are no commits, and no origin.

ls .git

HEAD		description	info		refs
config		hooks		objects

10.3. Searching the file system#

We can use the bash command find to search the file system note that this does not search the contents of the files, just the names.

find .git/objects/

.git/objects/
.git/objects//pack
.git/objects//info

we have a few items in that directory and the directory itself.

We can limit by type, to only files with the -type option set to f

find .git/objects/ -type f

And we have no results. We have no objects yet. Because this is an empty repo

10.4. Creating Git Blob Objects Directly#

There are 3 types:

blob objects: the content of your files (data)
tree objects: stores file names and groups files together (organization)
Commit Objects: stores information about the sha values of the snapshots

classDiagram class tree{ List: - hash: blob - string: type - string:file name } class commit{ hash: parent hash: tree string: message string: author string: time } class blob{ binary: contents } class object{ hash: name } object <|-- blob object <|-- tree object <|-- commit

All git objects are files stored with the name that is the hash of the content in the file

Let’s create our first git object. git uses hashes as the key. We give the hashing function some content, it applies the algorithm and returns us the hash as the reference to that object. We can also write to our .git directory with this.

The git hash-object function works on files, but we do not have any files yet. We can create a file, but we do not have to. Remememer, everything is a file.

We can put content into the stdout file with echo

echo "test content"

test content

which shows on our terminal. We can us a pipe to connect the stdout of on command to the stdin of the next.

and we can use a pipe to connect std out of one command to stdin of the next command. Then we can use the --stdin option to tell git hash-object to read from there.

echo "test content" | git hash-object -w --stdin

We can break down this command:

git hash-object would take the content you handed to it and merely return the unique key
-w option tells the command to also write that object to the database
--stdin option tells git hash-object to get the content to be processed from stdin instead of a file
the | is called a pipe (what we saw before was a redirect) it pipes a process output into the next command
echo would write to stdout, withthe pip it passes that to std in of the git-hash

d670460b4b4aece5915caf5c68d12f560a9fe3e4

and we can check if it wrote to the directory.

find .git/objects/ -type f

.git/objects//d6/70460b4b4aece5915caf5c68d12f560a9fe3e4

and we see a file that it was supposed to have!

This file contains binary content that is hard to read. Fortunately, git provides a utility. We can use cat-file to use the object by referencing at least 4 characters that are unique from the full hash, not the file name.

cat-file requires an option, we have used 2 so far:

-p is for pretty print
-t is for type

We can then view the type of this object

git cat-file -t d670

blob

and its actual content

git cat-file -p d670

test content

This is the content that we put in, as expected.

10.4.1. Hashing a file#

To hash content that corresponds to a file, we have to have a file first. We can use a redirect > to send content from echo to a different file.

echo "version 1" > test.txt

and store it, by hashing it

git hash-object -w test.txt 

83baae61804e65cc73a7201a7252750c76066a30

we can look at what we have.

find .git/objects/ -type f

.git/objects//d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects//83/baae61804e65cc73a7201a7252750c76066a30

Now this is the status of our repo.

classDiagram class d67046{ + "test content" +(blob) } class 83baae{ + Version 1 + (blob) }

We can check the type of files with -t and git cat-file

git cat-file -t 83ba

blob

git cat-file -t d670

blob

We have hashed content, but our git status is still saying no commits and only one untracked file.

git status

On branch main

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	test.txt

nothing added to commit but untracked files present (use "git add" to track)

So far, even though we have hashed the object, git still thinks the file is untracked, because it is not in the tree and there are no commits that point to that part of the tree.

the workign directory and the git repo are not strictly the same thing, and can be different like this. Mostly they will stay in closer relationship that we currently have unless we use plumbling commands, but it is good to build a solid understanding of how the .git directory relates to your working directory.

Notice, that we only have one file in the working directory.

ls

test.txt

10.5. Writing a tree#

We can write a tree with the write-tree command

git write-tree

4b825dc642cb6eb9a060e54bf8d69288fbee4904

we can see the object exists

find .git/objects/ -type f

.git/objects//d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects//4b/825dc642cb6eb9a060e54bf8d69288fbee4904
.git/objects//83/baae61804e65cc73a7201a7252750c76066a30

we can verify that it is a tree

git cat-file -t 4b82

tree

and look at the tree contents

git cat-file -p 4b82

but it is empty.

We have a file that was created, git write-tree did something but we told it to write a file without putting anything in the index, or staging area, that it reads from.

classDiagram class d67046{ +"test content" +(blob) } class 83baae{ +Version 1 +(blob) } class 4b825d{ +(tree) }

Now, we can add our file and its hashed object as it is to the index.

git update-index --add --cacheinfo 100644 \
 83baae61804e65cc73a7201a7252750c76066a30 test.txt

the \ lets us wrap onto a second line.

this the plumbing command git update-index updates (or in this case creates an index, the staging area of our repository)
the --add option is because the file doesn’t yet exist in our staging area (we don’t even have a staging area set up yet)
--cacheinfo because the file we’re adding isn’t in your directory but is in the database.
in this case, we’re specifying a mode of 100644, which means it’s a normal file.
then the hash object we want to add to the index (the content) in our case, we want the hash of the first version of the file, not the most recent one.
finally the file name of that content

10.6. A staging and snapshot edge case#

Let’s check git status again:

git status

On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   test.txt

Now the file is staged.

Let’s edit it further.

echo "version 2" >> test.txt

So the file has two lines

cat test.txt 

version 1
version 2

Now check status again.

git status

On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   test.txt

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   test.txt

We added the first version of the file to the staging area, so that version is ready to commit but we have changed the version in our working directory relative to the version from the hash object that we put in the staging area so we also have changes not staged.

We can hash and store this version too.

git hash-object -w test.txt 

0c1e7391ca4e59584f8b773ecdbbb9467eba1547

and verify a new object was created?

find .git/objects/ -type f

.git/objects//0c/1e7391ca4e59584f8b773ecdbbb9467eba1547
.git/objects//d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects//4b/825dc642cb6eb9a060e54bf8d69288fbee4904
.git/objects//83/baae61804e65cc73a7201a7252750c76066a30

visually we now have 4 objects:

plain content
2 versions of the file
an empty tree

classDiagram class d67046{ +"test content" +(blob) } class 83baae{ +Version 1 +(blob) } class 4b825d{ +(tree) } class 0c1e73{ +Version 1 +Verson 2 +(blob) }

Again, let’s check in

git status

On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   test.txt

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   test.txt

Hashing the content does not change the status here at all, because git status only traces from the HEAD pointer, compares that commit’s tree to the index and the current working directory. We have no commits still and the index has not changed by hashing the content and creating a blob for it.

10.7. Writing a tree#

Now we can write a tre and it will have content.

git write-tree

d8329fc1cc938780ffdd9f94e0d364e0ea74f579

Lets examine the tree, first check the type

git cat-file -t d832

tree

and now we can look at its contents

git cat-file -p d832

100644 blob 83baae61804e65cc73a7201a7252750c76066a30	test.txt

it tells us the mode of the single file is a regular file, that it is a blob, the hash where the file’s content is snapshotted and the file name.

classDiagram class d67046{ +"test content" +(blob) } class 83baae{ +Version 1 +(blob) } class 4b825d{ +(tree) } class d8329f{ +blob: 83baae +filename: test.txt +(tree) } class 0c1e73{ +Version 1 +Verson 2 +(blob) } d8329f --|> 83baae

This only keeps track of th objects, there are also still the HEAD that we have not dealt with and the index.

Now two of our objects are linked! the tree points to the blob of the first version of the file. We are getting closer to what would happend when we make a commit with a porcelain command.

git status

On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   test.txt

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   test.txt

10.8. Creating a commit manually#

We can echo a commit message through a pipe into the commit-tree plumbing function to commit a particular hashed object.

echo "first commit" | git commit-tree d8329

0ec6f39c964a3149d426ae45bef4385e14444b2c

and we get back a hash. But notice that this hash is unique for each of us. Because the commit has information about the time stamp and our user.

The above hash is the one I got during class, but if I rerun I will get a different hash even though I have the same name and e-mail because the time changed.

Important

You can find the hash of your commit object later by comparing to mine and getting the one that is unique or by checkign the type of all of your objects until you find the one that is a commit.

We can also look at its type

git cat-file -t 0ec6

commit

and we can look at the content

git cat-file -p 0ec6

tree d8329fc1cc938780ffdd9f94e0d364e0ea74f579
author Sarah M Brown <brownsarahm@uri.edu> 1697131751 -0400
committer Sarah M Brown <brownsarahm@uri.edu> 1697131751 -0400

first commit

Now we check the final status of our repo

git status

On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   test.txt

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   test.txt

we see it is “on main” this is because we set the branch to main , but since we have not written there, we have to do it directly. Notice that when we use the porcelain command for commit, it does this automatically; the porcelain commands do many things.

Important

Check that you also have 6 objects and 5 of them should match mine, the one you should not have is the 0ec6f one but you should have a different hash for your commit.

classDiagram class d67046{ +test content +(blob) } class 83baae{ +Version 1 +(blob) } class 4b825d{ +(tree) } class d8329f{ +blob: 83baae +filename: test.txt +(tree) } class 0c1e73{ +Version 1 +Verson 2 +(blob) } class 0ec6f{ +tree d8329f +author name +commit time } d8329f --|> 83baae 0ec6f --|> d8329f

It still says no commits, even though we just verified that it has one.

10.9. Git references#

Remember git status compares the working directory to the current state of the active branch

we cansee the working directory with: ls
we can see the active branch in the HEAD file
what is its status?

Notice, git said we have no commits yet even though we have written a commit.

In our case because we made the commit manually, we did not update the branch.

This is because the main branch does not point to any commit.

We can verify by looking at the HEAD file

cat .git/HEAD

ref: refs/heads/main

and see that:

cat .git/refs/heads/main

cat: .git/refs/heads/main: No such file or directory

which does not even exist!

we can see the objects though:

find .git/objects/ -type f

.git/objects//0c/1e7391ca4e59584f8b773ecdbbb9467eba1547
.git/objects//0e/c6f39c964a3149d426ae45bef4385e14444b2c
.git/objects//d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects//d8/329fc1cc938780ffdd9f94e0d364e0ea74f579
.git/objects//4b/825dc642cb6eb9a060e54bf8d69288fbee4904
.git/objects//83/baae61804e65cc73a7201a7252750c76066a30

and we can create that file by putting the full hash of the commit into the file

echo 0ec6f39c964a3149d426ae45bef4385e14444b2c > .git/refs/heads/main

so that

cat .git/refs/heads/main

shows the hash:

0ec6f39c964a3149d426ae45bef4385e14444b2c

Now we check status and we are all set!

git status

On branch main
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   test.txt

no changes added to commit (use "git add" and/or "git commit -a")

10.10. Prepare for Next Class#

Read the notes from October 12. We will build on these directly in the future. You need to have the test repo with the same status for lab on 10/13 and class on 10/17 Make sure you have completed all of the steps in the github inclass repo from September too.
Start recording notes on how you use IDEs for the next couple of weeks using the template file below. We will come back to these notes in class later, but it is best to record over a time period instead of trying to remember at that time. Record which IDE(s) you use, what tasks you do, what features you use, what extensions, etc. Store your notes in your kwl repo in idethoughts.md on an ide_prep branch. This is prep for much later, it does not go in the October 17 experience badge branch

10.11. Review today’s class#

Make a table in gitplumbingreview.md in your KWL repo that relates the two types of git commands we have seen: plubming and porcelain. The table should have two columns, one for each type of command. Each row should have one git plumbing command and at least one of the corresponding git porcelain command(s). Include two rows: add and commit.
Contribute to your group repo and review a classmate’s contribution. Include a link to your contribution and review in your badge PR comment using markdown link syntax. Your contribution can be a short how to with a code excerpt or a resource. Include a link to your contribution and review in your badge PR comment using markdown link syntax:

[text to display](url/of/link)

10.12. More Practice#

Read more details about git internals to review what we did in class in greater detail. Make a file gitplumbingdetail.md and create a visualization that is compatible with version control (eg can be viewed in plain text and compared line by line, such as table or mermaid graph) that shows the relationship between at least three porcelain commands and their corresponding plumbing commands.
Create gitislike.md and explain main git operations we have seen (add, commit, push) in your own words in a way that will either help you remember or how you would explain it to someone else at a high level. This might be analogies or explanations using other programming concepts or concepts from a hobby.
Contribute to your group repo and review a classmate’s contribution. Include a link to your contribution and review in your badge PR comment using markdown link syntax. Your contribution can be a short how to with a code excerpt or a resource. (view the raw version of this issue page for the git internals link above for an example)

10.13. Experience Report Evidence#

Put your list of objects in your test repo into a file test_repo_ojbects.txt in your Experience report branch.

10.14. Questions After Today’s Class#

10.14.1. Can a pointer to an old file exist as an object in the git objects?#

a tree will have a pointer to the previous version of the file yes.

10.14.2. Is there a manual way to do other commands like git push and git add?#

We did git add today. We hashed the object and added the hash to the index. Push does have some other steps in it, those are in the practice badge.

10.14.3. Why you are allowed to create empty trees if it is basically a wasted file?#

Because it does not hurt anything.

10.14.4. How are pipes and redirect different?#

Pipes connect stdout to stdin, to connect commands. Redirect just changes the source.

10.14.5. When do people manually create git objects and commits?#

Manually exploring is not a common scenario. These commands exist primarily for git itself, remember modular code gives more power.

We went through doing it manually to see and understand. You can end up in edge case scenarios in other ways, but using the plubming commands is the best way to to see that.

10.14.6. Why are these commands so easy to confuse and mix up with each other? I have a hard time remembering some of the longer commands.#

Some of the commands we have used are detailed low frequency use commands that you do not need to memorize.

10.14.7. Is there a quicker way to find the onbject type if you get lost. Instead of going through each of them with the “git cat-file -t” command?#

For this activity, you can copmare to mine. In a normal use scenario, you would not do this manually, so you would have some other way to know what commit or object you were looking for.

10.14.8. what are hashes?#

A hash is a fixed length representation of variable lenght input. We will see more about them in the coming weeks.

10.14.9. What are trees?#

Trees are one type of git object, that contains information about the directory at a given moment. It contains pointers to the blob objects andinformation about each file.

10.14.10. How do I change my inclass branch from master to main??#

git branch -m main

10.14.11. What is the difference between the -t and -p flags? I noticed them while going through the comnmands.#

these are options on git cat-file. They’re defined above and at the link.

10.14.12. Would there be any reason to use these commands rather than relying on the porcelain commands?#

Not if things are going well, but they could be used to try to figure out what went wrong if you find a repo in a weird state.

10.14.13. when would we use this over doing it on GitHub itself aside from offline?#

We cannot run these detailed steps on github in browser. Recall, we do not even directly “add” files there. In browser, we have to commit in order to save a file.

10.14.14. Why does the file still have untracked changes?#

We hashed the file, but did not put the second hash into the tree or create a second commiit.