How does git really work?
Contents
9. How does git really work?#
9.1. Creating a repo from scratch#
We will start in the top level course directory.
cd systems
ls
Mine looks like this:
2022-09-19 courseutils github-inclass-brownsarahm
Yours should also have your kwl repo, group repo, etc.
We can create an empty repo from scratch using git init
git init test
hint: Using 'master' as the name for the initial branch. This default branch name
hint: is subject to change. To configure the initial branch name to use in all
hint: of your new repositories, which will suppress this warning, call:
hint:
hint: git config --global init.defaultBranch <name>
hint:
hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
hint: 'development'. The just-created branch can be renamed via this command:
hint:
hint: git branch -m <name>
Initialized empty Git repository in /Users/brownsarahm/Documents/inclass/systems/test/.git/
It creates a folder and gives us a warning about branch names. If you have a new install you will not see this, because new versions of git have this by default.
fundamentlaly the default branch is not special. you can name it whatever you want.
Historically it was called master.
derived from a master/slave analogy which is not even how git works, but was adopted terminology from other projects
the broader community is as well
git allows you to make your default not be master
literally the person who chose the names “master” and “origin” regrets that choice the name main is a more accurate and not harmful term and the current convention.
ls
we can see that we hav ea new directory
2022-09-19 github-inclass-brownsarahm
courseutils test
Now we want to change the name of the default branch
git branch -m main
fatal: not a git repository (or any of the parent directories): .git
but we have to cd into it first.
cd test/
git branch -m main
git status
On branch main
No commits yet
nothing to commit (create/copy files and use "git add" to track)
and we have a completely empty repo.
ls .git/
HEAD description info refs
config hooks objects
we’ve looked at most of these, but we have not been able to see the objects before. We will work with those now.
9.2. Searching the file system#
We can use the bash command find
to search the file system note that this does not search the contents of the files, just the names.
find .git/objects
we have a few items in that directory and the directory itself.
.git/objects
.git/objects/pack
.git/objects/info
We can limit by type, to only files with the -type
option set to f
find .git/objects -type f
And we have no results. We have no objects yet.
9.3. Git Objects#
There are 3 types:
blob objects: the content of your files (data)
tree objects: stores file names and groups files together (organization)
Commit Objects: stores information about the sha values of the snapshots
Let’s create our first one. git uses hashes as the key. We give the hashing function some content, it applies the algorithm and returns us the hash as the reference to that object. We can also write to our database wit this.
echo "test content" | git hash-object -w --stdin
git hash-object would take the content you handed to it and merely return the unique key
w option then tells the command to also write that object to the database
–stdin option tells git hash-object to get the content to be processed from stdin instead of a file
the
|
is called a pipe (what we saw before was a redirect) it pipes a process output into the next commandecho would write to stdout, withthe pip it passes that to std in of the git-hash
Important
pipes are an important content too. we’re seeing them in context of real uses, and we will keep seing them. Pipes connec tthe std out of one command t othe std in of the next.
we get back the hash:
d670460b4b4aece5915caf5c68d12f560a9fe3e4
and we can check if it wrote to the database.
find .git/objects -type f
.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
and we see a file that it was supposed to have!
We can use cat-file
to use the object by referencing at least 4 characters that are unique from the full hash, not the file name. (70460
will not work)
git cat-file -p d6704
cat-file
requires an option -p
is for pretty print
test content
we see it stored the contet we expected.
cat .git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
but without git
’s cat-file
’ we cannot read that file.
xK??OR04f(I-.QH??+I?+?K? ```
9.4. Hashing a file#
let’s create a file
echo 'version 1' > test.txt
and store it in our database, by hashing it
git hash-object -w test.txt
83baae61804e65cc73a7201a7252750c76066a30
we can look at what we have.
find .git/objects -type f
.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30
and what is in the working directory.
ls
test.txt
Note
the workign directory and the git repo are not strictly the same thing, and can be different like this. Mostly they will stay in closer relationship that we currently have unless we use plumbling commands, but it is good to build a solid understanding of how the .git
directory relates to your working directory.
9.5. Writing to the index#
So far, even though we have hashed the object, git still thinks the file is untracked, because it is not in the tree and there are no commits that point to that part of the tree.
git status
On branch main
No commits yet
Untracked files:
(use "git add <file>..." to include in what will be committed)
test.txt
nothing added to commit but untracked files present (use "git add" to track)
First, let’s edit the file
echo 'version 2' >> test.txt
so it looks like this:
cat test.txt
version 1
version 2
and then hash the new version of the file too.
git hash-object -w test.txt
0c1e7391ca4e59584f8b773ecdbbb9467eba1547
And we can look at our objects again
find .git/objects -type f
.git/objects/0c/1e7391ca4e59584f8b773ecdbbb9467eba1547
.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30
We have the string we wrote directly and the two versions of the file.
we can verify the last one that we have not looked at yet.
git cat-file -p 0c1e
version 1
version 2
TO add this to the index
git update-index --add --cacheinfo 100644 \
83baae61804e65cc73a7201a7252750c76066a30 test.txt
this the plumbing command
git update-index
updates (or in this case creates an index, the staging area of our repository)the
--add
option is because the file doesn’t yet exist in our staging area (we don’t even have a staging area set up yet)--cacheinfo
because the file we’re adding isn’t in your directory but is in the database.in this case, we’re specifying a mode of 100644, which means it’s a normal file.
then the hash object we want to add to the index (the content) in our case, we want the hash of the first version of the file, not the most recent one.
finally the file name of that content
Now let’s see what git knows:
git status
On branch main
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: test.txt
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: test.txt
We added the first version of the file to the staging area, so that version is ready to commit but we have changed the version in our working directory relative to the version from the hash object that we put in the staging area so we also have changes not staged.
we can see what those changes are with git diff
git diff test.txt
diff --git a/test.txt b/test.txt
index 83baae6..0c1e739 100644
--- a/test.txt
+++ b/test.txt
@@ -1 +1,2 @@
version 1
+version 2
the first few lines tell us what git is doing; it says that it is comparing the content at hash 83baae6 to hash 0c1e739 (as expected, 83baae6 is the one we put in the index; 0c1e739 is the last thing we hashed, which is also still what is our working directory)
the next three lines say that we the a one is missing lines relative to the b one and a quantitative desription.
the last two lines are the file with changes between the two versions marked the second version has “version 2” added to it relative to the first one.
9.6. Creating a commit manually#
We can echo a commit message through a pip into the commit-tree plumbing function to cmmit a particular hashed object.
echo "first commit" | git commit-tree 83baa
fatal: 83baae61804e65cc73a7201a7252750c76066a30 is not a valid 'tree' object
But we can actually only commit tree
objects
we have
find .git/objects -type f
.git/objects/0c/1e7391ca4e59584f8b773ecdbbb9467eba1547
.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30
the -t
option of cat-file can tell us the type:
git cat-file 0c1e -t
blob
git cat-file d6704 -t
blob
git cat-file 83baa -t
blob
These are all blob objects, the actual content that we are storing
we can write a tree though:
git write-tree
d8329fc1cc938780ffdd9f94e0d364e0ea74f579
and look at this:
git cat-file -p d8329
100644 blob 83baae61804e65cc73a7201a7252750c76066a30 test.txt
and check its type:
git cat-file -t d8329
tree
Now that we hav ea tree object we can commit the tree.
echo "first commit" | git commit-tree d8329
e09139a38f4fd6d82715c32aab9adfed67a87ba5
and we get back a hash. But notice that this hash is unique for each of us. Because the commit has information about the time stamp and our user. The above hash is the one I got during class, but when I re-ran this while typing the notes I got a different hash (d450567fec96cbd8dd514313db9bcb96ad7664b0
) even though I have the same name and e-mail because the time changed.
Important
verify that you have the same git status and list of objects as below
git status
On branch main
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: test.txt
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: test.txt
find .git/objects -type f
.git/objects/0c/1e7391ca4e59584f8b773ecdbbb9467eba1547
.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects/d8/329fc1cc938780ffdd9f94e0d364e0ea74f579
.git/objects/e0/9139a38f4fd6d82715c32aab9adfed67a87ba5
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30
we will all have 4 of the same objects and one unique one.
9.7. Review today’s class#
Review the notes
For the core “Porcelain” git commands we have used (add, commit), make a table of which git plumbing commands (of those we have seen) they use in
gitplumbingreview.md
in your KWL repo. it might be multiple Porcelain for each plumbing.Contribute to your group repo and review a classmate’s contribution
9.8. Prepare for Next Class#
Make notes on how you use IDEs for the next couple of weeks using the template file in the course notes (will provide prompts and tips). We will come back to these notes in class later, but it is best to record over a time period instead of trying to remember at that time. Store your notes in your kwl repo in
idethoughts.md
on anide_prep
branch.make sure that you have a
test
git repo that matches the notes.
# IDE Thoughts
## Actions Accomplished
<!-- list what things you do: run code/ edit code/ create new files/ etc; no need to comment on what the code you write does -->
## Features Used
<!-- list features of it that you use, like a file explorer, debugger, etc -->
9.9. More Practice#
Read about git internals to review what we did in class in greater detail. Make
gitplumbingdetail.md
by copying yourgitplumbingreview.md
and then add in the full detail including all plumbing commands. Also add one more high level command (revert, reset, pull, fetch) to your table.Add to your
gitplumbingdetail.md
file explanations of the main git operations we have seen (add, commit, push) in your own words in a way that will either help you remember or how you would explain it to someone else at a high level. This might be analogies or explanations using other programming concepts or concepts from a hobby. Add this under a subheading##
with a descriptive title (for example “Git In terms of”) For one thing your understanding changed or an open question you, look up or experiment to find the answer and contribute the question and answer to the course website.
9.10. Questions After Class#
9.10.1. Would any of the pull requests we make in class count towards the Hacktoberfest requirements?#
The PRs to your group and KWL repos will not, because those are private. I will make it so that contributions to the course website can count. To be honest, I will only give you each one qualifying PR to the course site for Hacktoberfest.
I will also though make the courseutils repo qualify and create some issues there to outline improvements to it that I want to make. That would require some python knowledge too.
The Hacktoberfest participation page also has links to good repos to contribute to.
9.10.2. how do you make a local repo that’s private on github?#
We will push this repo from beign created locally to GitHub next week.
9.10.3. Are hash algorithms supposed to always output a unique value?#
Yes for each content you put in, it should gibe a unique output. If you put the same content in, most of the time you want a repeatable hashing function
9.10.4. How did you set up the courseutils for tracking. I just want to know out of curiosity.#
It’s a public repo of python code that I made installable. You can look at the source code if you would like. If you have questions, you can post an issue there or use office hours.