7. What is git?#

7.1. Admin#

get credit for this class toward your major

  • substitution of: CSC392 Intro to Computer Systems For: 300 level elective

  • no rationale

  • send the form to Professor Dipippo

Participate in research:see Prismia for a survey link

7.2. Git is a File system#

Important

git book is the official reference on git.

this includes other spoken languages as well if that is helpful for you.

git is fundamentally a content-addressable filesystem with a VCS user interface written on top of it.

Content-addressable filesystem means a key-value data store.

what types of programming have you seen that use key- value pairs?
  • python dictionaries

  • pointers (address,content)

  • parameter, passed values

What this means is that you can insert any kind of content into a Git repository, for which Git will hand you back a unique key you can use later to retrieve that content.

7.3. Git is a Version Control System#

git stores snapshots of your work each time you commit.

snapshot of 5 version sof 3 files

it uses 3 stages:

3 stages in git

7.4. Git has two sets of commands#

Porcelain: the user friendly VCS

Plumbing: the internal workings- a toolkit for a VCS

We have so far used git as a version control system. A version control system, in general, will have operations like commit, push, pull, clone. These may work differently under the hood or be called different things, but those are what something needs to have in order to keep track of different versions.

The plumbing commands reveal the way that git performs version control operations. This means, they implement file system operations for the git version control system.

You can think of the plumbing vs porcelain commands like public/private methods. As a user, you only need the public methods (porcelain commands) but those use the private ones to get things done (plumbing commands). We will use the plumbing commands over the next few classes to examine what git really does when we call the porcelain commands that we will typically use.

7.5. Git is distributed#

What does that mean?

Git runs locally. It can run in many places, and has commands to help sync across remotes, but git does not require one copy of the repository to be the “official” copy and the others to be subordinate. git just sees repositories. For human reasons, we like to have one “official” copy and treat the others as local copies, but that is a social choice, not a technological requirement of git. Even though we will typically use it with an offical copy and other copies, having a tool tht does not care, makes the tool more flexible and allows us to create workflows, or networks of copies that have any relationship we want.

It’s about the workflows, or the ways we socially use the tool.

Some example workflows

7.5.1. Subversion WOrkflow#

subversion workflow

7.5.2. Integration Manager#

integration manager workflow

7.5.3. dictator and lieutenants#

dictator and lieutenants workflow

7.6. Anatomy of Git#

Let’s look at git again in our github-inclass repo

As always, start by checking where we are:

git status
On branch 2-create-an-about-file
Your branch is up to date with 'origin/2-create-an-about-file'.

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	hello.md

nothing added to commit but untracked files present (use "git add" to track)

I was not on the main branch, so I’ll switch back there

git checkout main
Switched to branch 'main'
Your branch is up to date with 'origin/main'.

Remember, we can see hidden files with the -a option.

ls -a
.		.git		README.md	hello.md
..		.github		about.md

of the structure of git is contained in the .git directory, that is called the “database” this contains

cd .git
ls
COMMIT_EDITMSG	ORIG_HEAD	description	info		packed-refs
FETCH_HEAD	REBASE_HEAD	hooks		logs		refs
HEAD		config		index		objects

We see a lot of files here. The most important ones are:

name

type

purpose

objects

directory

the content for your database

refs

directory

pointers into commit objects in that data (branches, tags, remotes and more)

HEAD

file

points to the branch you currently have checked out

index

file

stores your staging area information.

7.6.1. Git HEAD#

cat HEAD
ref: refs/heads/main

this indeed is the branh we have checked out and we can confirm with git status

git status
fatal: this operation must be run in a work tree

excpet we cannot call this in the .git directory, we have to got back up

cd ..
git status
On branch main
Your branch is up to date with 'origin/main'.

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	hello.md

nothing added to commit but untracked files present (use "git add" to track)

and we indeed see that it matches.

We can also confirm that it changes, that it’s not just always main by changing branches. We can first list the branches

git branch
  2-create-an-about-file
* main
  organtzation

then switch to one of the others

git checkout 2-create-an-about-file
Switched to branch '2-create-an-about-file'
Your branch is up to date with 'origin/2-create-an-about-file'.

And again examine HEAD

cat .git/HEAD
ref: refs/heads/2-create-an-about-file

it indeed changes.

Next, what is that file that the HEAD file points to?

cat .git/refs/heads/main
3c9980a4883782386ea3112d521b1b95997b0be6

this is a commit, the last one that was added to the branch we checked.

To confirm we switch back to main

git checkout main
Switched to branch 'main'
Your branch is up to date with 'origin/main'.

and check the commit history

git log
commit 3c9980a4883782386ea3112d521b1b95997b0be6 (HEAD -> main, origin/main, origin/HEAD)
Author: Sarah M Brown <brownsarahm@uri.edu>
Date:   Wed Sep 14 17:40:01 2022 -0400

    create about

commit ec3dd020d2767e45b6cd61f7c4ea5f6f2be9a3d8
Merge: 1613072 db2e41d
Author: Sarah Brown <brownsarahm@uri.edu>
Date:   Wed Sep 14 16:56:34 2022 -0400

    Merge pull request #4 from introcompsys/create_readme

    Create README.md

commit db2e41d48129b2c3ad09b78f1c14c2cd295e3eb2 (origin/create_readme)
Author: Sarah Brown <brownsarahm@uri.edu>
Date:   Wed Sep 14 16:55:23 2022 -0400

    Create README.md

commit 1613072525c141b13d3ac7db68e8c1dbee70496b
Author: github-classroom[bot] <66690702+github-classroom[bot]@users.noreply.github.com>
Date:   Wed Sep 14 20:51:29 2022 +0000

    Initial commit

Try it Yourself

use git log to draw a map that shows where the different branches are relative to one another

Also notice that next to the commits, it lists what branches point to that commit.

We can also see the whole list of branches

git branch
  2-create-an-about-file
* main
  organtzation

and compare that to the refs/head directory.

ls .git/refs/heads/
2-create-an-about-file	main			organtzation

which they do match.

there are other types of refs

ls .git/refs/
heads	remotes	tags

tags are what I use to create releases on the course website (if you watch the course repo you can be notified when I make major updates that way)

remotes are other copies that are linked, like GitHub.

7.7. Git Config#

cat config
[core]
	repositoryformatversion = 0
	filemode = true
	bare = false
	logallrefupdates = true
	ignorecase = true
	precomposeunicode = true
[remote "origin"]
	url = https://github.com/introcompsys/github-inclass-brownsarahm.git
	fetch = +refs/heads/*:refs/remotes/origin/*
[branch "main"]
	remote = origin
	merge = refs/heads/main
[branch "organtzation"]
	remote = origin
	merge = refs/heads/organtzation
[branch "2-create-an-about-file"]
	remote = origin
	merge = refs/heads/2-create-an-about-file

This file tracks the different relationships between your local copy and remots that it knows. This repository only knows one remote, named origin, with a url on GitHub. A git repo can have multiple remotes, each with its own name and url.

it also maps each local branch to its corresponding origin and the local place you would merge to when you pull from that remote branch.

7.8. Git Index#

cd ..
ls
README.md	about.md	hello.md
git status
On branch main
Your branch is up to date with 'origin/main'.

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	hello.md

nothing added to commit but untracked files present (use "git add" to track)
git add .

git status
On branch main
Your branch is up to date with 'origin/main'.

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	new file:   hello.md

git write-tree
9e404c8b4b67cc0572174531d3b7364e9c88fd94
git cat-file -p 9e404c8b4b67cc0572174531d3b7364e9c88fd94
040000 tree 95b60ce8cdec1bc4e1df1416e0c0e6ecbd3e7a8c	.github
100644 blob bcc1d9287b5cae329629cf3e0779b065b72dad7a	README.md
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391	about.md
100644 blob 3b18e512dba79e4c8300dd08aeb37f8e728b8dad	hello.md
git cat-file -t 9e404c8b4b67cc0572174531d3b7364e9c88fd94
tree

To better understand this, we can compare it to the last commit.

To view an object we actually only need enough characters to make the name unique. 4 worked for me here:

git cat-file -p 3c99
tree 2c4495ca1dc6868006365dff726cccea781cea61
parent ec3dd020d2767e45b6cd61f7c4ea5f6f2be9a3d8
author Sarah M Brown <brownsarahm@uri.edu> 1663191601 -0400
committer Sarah M Brown <brownsarahm@uri.edu> 1663191601 -0400

create about

the object for a commit is the information about the commit:

  • the tree is the snapshot that was indexed (what we want to compare the above to).

  • the parent ithe previous commit

  • then there’s author & committer information and time

  • last is the commit message

So, we can look at this tree object

git cat-file -p 2c4495
040000 tree 95b60ce8cdec1bc4e1df1416e0c0e6ecbd3e7a8c	.github
100644 blob bcc1d9287b5cae329629cf3e0779b065b72dad7a	README.md
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391	about.md

we can see that the hello.md blob object is the difference and the other two files even have the same hash (so they were not changed).

We say that that part is only the the staging area, beacue this item is added to the objects, but it is not in the commit history yet.

7.9. Git Objects#

All of these blob objects and trees are stored in the objects folder

cd .git/objects/
ls

it has folders with 2 character names

25	35	59	84	9d	aa	e6	f9
2c	3c	5d	8b	a0	ad	e7	info
30	40	7a	8c	a4	d2	ec	pack
cd 25
ls

that contain files that match commits and other objects’ hashes.

ecf2cf579d20d41c1ae2f9844662bbd4e43315

For example, we can confirm the last commit to main

cd ../3c
ls
9980a4883782386ea3112d521b1b95997b0be6

These objects are not human readable though as is

cat 9980a4883782386ea3112d521b1b95997b0be6
x??A
?:m\x?;???^?<?x?n??2?4??Gp?<x?e[????x?E??0t˜?z?%O]?9:???w	͓??T|J???"E
                        ???"m????d??s#E?*?j>?>J	```

7.10. Review today’s class#

  1. Practice with git log and redirects to write the commit history of your main branch for your kwl chart to a file gitlog.txt and commit that file to your kwl repo.

  2. Review the notes

  3. Update your kwl chart with what you have learned or new questions

7.11. Prepare for Next Class#

  1. Try exploring your a repo manually and bring more questions

  2. Make sure that you have a grading contract that has been reviewed at least once

7.12. More Practice#

  1. Read about different workflows in git and add responses to the below in a workflows.md in your kwl repo. Two good places to read from are Git Book and the atlassian Docs

  2. Contribute either a glossary term, cheatsheet item, or additional resource/reference to your group repo.

  3. Complete one peer review of a team mate’s contribution

## Workflow Reflection

1. What advantages might it provide that git can be used with different workflows?
1. Which workflow do you think you would like to work with best and why?
1. Describe a scenario that might make it better for the whole team to use a workflow other than the one you prefer.  

7.13. Questions#

7.13.1. Can you fork a repo you do not have access to?#

You cannot fork a repository that you do not have read access to. If it is private, the owner can allow or not allow forking. You can fork public repos and then submit PRs to suggest changes. Try making a contribution to the course site. Add a glossary entry, link a term from the notes to the glossary or fill in a caption.

7.13.2. what should I do if git crashes and I lose progress?#

git should not ever have this problem. You should not ever lose work again at all if you commit and push frequently.

I lost my whole computer a few months before I finished my PhD. I only lost 1 day’s worth of work though. The morning’s worth of work that caused the crash and the afternoon that day recovering the computer (new harddrive and Linux instead of Windows) and then I pulled my code and kept working.

7.13.3. Which is easier/better to use: the git CLI, or the GitHub website?#

the git CLI is more complete and more flexible, you can batch changes together into larger commits, for example. It is better for most work. GitHub is simple and less intimidating for novices so for small changes it is great.

7.13.4. What most important function that git can perform?#

restoring previous versions.

7.13.5. When encountering a merge conflict, is it best to use the terminal/bash window or the GitHub merging features?#

You can use the GitHub one for basic merge conflicts. For complex ones GitHub will give you terminal commands to download the work and work offline.

7.13.6. Why is there not an undo command if you delete a file or repo on git?#

There is a way to undo with git, there is not a way to undo rm in bash, unless you are using some sort of backup tool, or git.

7.13.7. Would a merge conflict or any kind of problem emerge if there’s a file exclusive to a branch that isn’t main, when we try to merge that branch’s contents into main?#

No, there is nothing that there can be a conflict with. That is one strategy to use to avoid conflicts, to have multiple, smaller files.

7.13.8. How often should I make a new branch in my KWL repo?#

One for each group of work you want graded. Ideally about one per course session.

7.13.9. how is git different than other coding languages?#

Git is a program that manages files, it is not a programming language. It is a program with many subroutines and designed for use on a shell, but still not a language. Languages have to have certain features.

7.13.10. where is git used mostly (outside this classroom/university)?#

Literally almost everywhere people write code.

7.13.11. One git scenario, I feel like I will reach a point where there are just too many branches if I’m supposed to make a new branch every time I edit something.#

You can delete a branch

7.13.12. whats the difference between git fork and git clone#

a fork keeps the copy on GitHub (or another server) a clone makes it local

7.13.13. Can we add arguments to make git more specific with any type of errors that may occur, ex merge conflicts?#

Git is very powerful and can be given more strict guidance on how to do a merge that may change if conflicts are produced or not.

7.13.14. How does GitHub differ from bitbucket?#

Pricing, being owned by Microsoft instead of Atlassian, and some in browser features. GitHub has a more advanced API and continuous integration lately. It also has classroom features that I use (to let you all create repos that I can manage and Mark can see automatically) that other git hosts do not have.

7.13.15. will we be constantly updating a .md file to the kwl repo for a detailed knowledge base on each of the items listed in the chart?#

You only need to add files as advised.

7.13.16. how often should we push information from the kwl table into each .md file?#

You should commit and push your work frequently. This allows you to have a back up and restore your work freqently?

7.13.17. If we run into errors using CLI tools what are ways or resources that can help debug and fix errors.#

It depends on the error. We’ll see some and I encourage you to expand upon this question.