9. How does git really work?#

9.1. Creating a repo from scratch#

We will start in the top level course directory.

cd systems
ls

Mine looks like this:

2022-09-19			courseutils			github-inclass-brownsarahm

Yours should also have your kwl repo, group repo, etc.

We can create an empty repo from scratch using git init

git init test
hint: Using 'master' as the name for the initial branch. This default branch name
hint: is subject to change. To configure the initial branch name to use in all
hint: of your new repositories, which will suppress this warning, call:
hint:
hint: 	git config --global init.defaultBranch <name>
hint:
hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
hint: 'development'. The just-created branch can be renamed via this command:
hint:
hint: 	git branch -m <name>
Initialized empty Git repository in /Users/brownsarahm/Documents/inclass/systems/test/.git/

It creates a folder and gives us a warning about branch names. If you have a new install you will not see this, because new versions of git have this by default.

fundamentlaly the default branch is not special. you can name it whatever you want.

Historically it was called master.

derived from a master/slave analogy which is not even how git works, but was adopted terminology from other projects

GitHub no longer does

the broader community is as well

git allows you to make your default not be master

literally the person who chose the names “master” and “origin” regrets that choice the name main is a more accurate and not harmful term and the current convention.

ls

we can see that we hav ea new directory

2022-09-19			github-inclass-brownsarahm
courseutils			test

Now we want to change the name of the default branch

git branch -m main
fatal: not a git repository (or any of the parent directories): .git

but we have to cd into it first.

cd test/
git branch -m main
git status
On branch main

No commits yet

nothing to commit (create/copy files and use "git add" to track)

and we have a completely empty repo.

ls .git/
HEAD		description	info		refs
config		hooks		objects

we’ve looked at most of these, but we have not been able to see the objects before. We will work with those now.

9.2. Searching the file system#

We can use the bash command find to search the file system note that this does not search the contents of the files, just the names.

find .git/objects

we have a few items in that directory and the directory itself.

.git/objects
.git/objects/pack
.git/objects/info

We can limit by type, to only files with the -type option set to f

find .git/objects -type f

And we have no results. We have no objects yet.


9.3. Git Objects#

There are 3 types:

  • blob objects: the content of your files (data)

  • tree objects: stores file names and groups files together (organization)

  • Commit Objects: stores information about the sha values of the snapshots

Let’s create our first one. git uses hashes as the key. We give the hashing function some content, it applies the algorithm and returns us the hash as the reference to that object. We can also write to our database wit this.

echo "test content" | git hash-object -w --stdin
  • git hash-object would take the content you handed to it and merely return the unique key

  • w option then tells the command to also write that object to the database

  • –stdin option tells git hash-object to get the content to be processed from stdin instead of a file

  • the | is called a pipe (what we saw before was a redirect) it pipes a process output into the next command

  • echo would write to stdout, withthe pip it passes that to std in of the git-hash

Important

pipes are an important content too. we’re seeing them in context of real uses, and we will keep seing them. Pipes connec tthe std out of one command t othe std in of the next.

we get back the hash:

d670460b4b4aece5915caf5c68d12f560a9fe3e4

and we can check if it wrote to the database.

find .git/objects -type f
.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4

and we see a file that it was supposed to have!

We can use cat-file to use the object by referencing at least 4 characters that are unique from the full hash, not the file name. (70460 will not work)

git cat-file -p d6704

cat-file requires an option -p is for pretty print

test content

we see it stored the contet we expected.

cat .git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4

but without git’s cat-file’ we cannot read that file.

xK??OR04f(I-.QH??+I?+?K?	```

9.4. Hashing a file#

let’s create a file

echo 'version 1' > test.txt

and store it in our database, by hashing it

git hash-object -w test.txt
83baae61804e65cc73a7201a7252750c76066a30

we can look at what we have.

find .git/objects -type f
.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30

and what is in the working directory.

ls
test.txt

Note

the workign directory and the git repo are not strictly the same thing, and can be different like this. Mostly they will stay in closer relationship that we currently have unless we use plumbling commands, but it is good to build a solid understanding of how the .git directory relates to your working directory.

9.5. Writing to the index#

So far, even though we have hashed the object, git still thinks the file is untracked, because it is not in the tree and there are no commits that point to that part of the tree.

git status
On branch main

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	test.txt

nothing added to commit but untracked files present (use "git add" to track)

First, let’s edit the file

echo 'version 2' >> test.txt

so it looks like this:

cat test.txt
version 1
version 2

and then hash the new version of the file too.

git hash-object -w test.txt
0c1e7391ca4e59584f8b773ecdbbb9467eba1547

And we can look at our objects again

find .git/objects -type f
.git/objects/0c/1e7391ca4e59584f8b773ecdbbb9467eba1547
.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30

We have the string we wrote directly and the two versions of the file.

we can verify the last one that we have not looked at yet.

git cat-file -p 0c1e
version 1
version 2

TO add this to the index

git update-index --add --cacheinfo 100644 \
  83baae61804e65cc73a7201a7252750c76066a30 test.txt
  • this the plumbing command git update-index updates (or in this case creates an index, the staging area of our repository)

  • the --add option is because the file doesn’t yet exist in our staging area (we don’t even have a staging area set up yet)

  • --cacheinfo because the file we’re adding isn’t in your directory but is in the database.

  • in this case, we’re specifying a mode of 100644, which means it’s a normal file.

  • then the hash object we want to add to the index (the content) in our case, we want the hash of the first version of the file, not the most recent one.

  • finally the file name of that content

Now let’s see what git knows:

git status
On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   test.txt

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   test.txt

We added the first version of the file to the staging area, so that version is ready to commit but we have changed the version in our working directory relative to the version from the hash object that we put in the staging area so we also have changes not staged.

we can see what those changes are with git diff

git diff test.txt
diff --git a/test.txt b/test.txt
index 83baae6..0c1e739 100644
--- a/test.txt
+++ b/test.txt
@@ -1 +1,2 @@
version 1
+version 2

the first few lines tell us what git is doing; it says that it is comparing the content at hash 83baae6 to hash 0c1e739 (as expected, 83baae6 is the one we put in the index; 0c1e739 is the last thing we hashed, which is also still what is our working directory)

the next three lines say that we the a one is missing lines relative to the b one and a quantitative desription.

the last two lines are the file with changes between the two versions marked the second version has “version 2” added to it relative to the first one.

9.6. Creating a commit manually#

We can echo a commit message through a pip into the commit-tree plumbing function to cmmit a particular hashed object.

echo "first commit" | git commit-tree 83baa
fatal: 83baae61804e65cc73a7201a7252750c76066a30 is not a valid 'tree' object

But we can actually only commit tree objects

we have

find .git/objects -type f
.git/objects/0c/1e7391ca4e59584f8b773ecdbbb9467eba1547
.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30

the -t option of cat-file can tell us the type:

git cat-file 0c1e -t
blob
git cat-file d6704 -t
blob
git cat-file 83baa -t
blob

These are all blob objects, the actual content that we are storing

we can write a tree though:

git write-tree
d8329fc1cc938780ffdd9f94e0d364e0ea74f579

and look at this:

git cat-file -p d8329
100644 blob 83baae61804e65cc73a7201a7252750c76066a30	test.txt

and check its type:

git cat-file -t d8329
tree

Now that we hav ea tree object we can commit the tree.

echo "first commit" | git commit-tree d8329
e09139a38f4fd6d82715c32aab9adfed67a87ba5

and we get back a hash. But notice that this hash is unique for each of us. Because the commit has information about the time stamp and our user. The above hash is the one I got during class, but when I re-ran this while typing the notes I got a different hash (d450567fec96cbd8dd514313db9bcb96ad7664b0) even though I have the same name and e-mail because the time changed.

Important

verify that you have the same git status and list of objects as below

git status
On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   test.txt

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   test.txt

find .git/objects -type f
.git/objects/0c/1e7391ca4e59584f8b773ecdbbb9467eba1547
.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
.git/objects/d8/329fc1cc938780ffdd9f94e0d364e0ea74f579
.git/objects/e0/9139a38f4fd6d82715c32aab9adfed67a87ba5
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30

we will all have 4 of the same objects and one unique one.

9.7. Review today’s class#

  1. Review the notes

  2. For the core “Porcelain” git commands we have used (add, commit), make a table of which git plumbing commands (of those we have seen) they use in gitplumbingreview.md in your KWL repo. it might be multiple Porcelain for each plumbing.

  3. Contribute to your group repo and review a classmate’s contribution

9.8. Prepare for Next Class#

  1. Make notes on how you use IDEs for the next couple of weeks using the template file in the course notes (will provide prompts and tips). We will come back to these notes in class later, but it is best to record over a time period instead of trying to remember at that time. Store your notes in your kwl repo in idethoughts.md on an ide_prep branch.

  2. make sure that you have a test git repo that matches the notes.

# IDE Thoughts

## Actions Accomplished
<!-- list what things you do: run code/ edit code/ create new files/ etc; no need to comment on what the code you write does -->


## Features Used
<!-- list features of it that you use, like a file explorer, debugger, etc -->

9.9. More Practice#

  1. Read about git internals to review what we did in class in greater detail. Make gitplumbingdetail.md by copying your gitplumbingreview.md and then add in the full detail including all plumbing commands. Also add one more high level command (revert, reset, pull, fetch) to your table.

  2. Add to your gitplumbingdetail.md file explanations of the main git operations we have seen (add, commit, push) in your own words in a way that will either help you remember or how you would explain it to someone else at a high level. This might be analogies or explanations using other programming concepts or concepts from a hobby. Add this under a subheading ## with a descriptive title (for example “Git In terms of ”)

  3. For one thing your understanding changed or an open question you, look up or experiment to find the answer and contribute the question and answer to the course website.

9.10. Questions After Class#

9.10.1. Would any of the pull requests we make in class count towards the Hacktoberfest requirements?#

The PRs to your group and KWL repos will not, because those are private. I will make it so that contributions to the course website can count. To be honest, I will only give you each one qualifying PR to the course site for Hacktoberfest.

I will also though make the courseutils repo qualify and create some issues there to outline improvements to it that I want to make. That would require some python knowledge too.

The Hacktoberfest participation page also has links to good repos to contribute to.

9.10.2. how do you make a local repo that’s private on github?#

We will push this repo from beign created locally to GitHub next week.

9.10.3. Are hash algorithms supposed to always output a unique value?#

Yes for each content you put in, it should gibe a unique output. If you put the same content in, most of the time you want a repeatable hashing function

9.10.4. How did you set up the courseutils for tracking. I just want to know out of curiosity.#

It’s a public repo of python code that I made installable. You can look at the source code if you would like. If you have questions, you can post an issue there or use office hours.

9.10.5. are hash objects exclusively backend mechanics, or can we use our knowledge of them to our advantage”#