What is a commit number?

12. What is a commit number?#

12.1. Admin#

spring break is like a time pause (you get an extra week on things assigned last week and this week)
grading updates PR is made

12.2. What is a hash?#

a hash is:

a fixed size value that can be used to represent data of arbitrary sizes
the output of a hashing function
often fixed to a hash table

Common examples of hashing are lookup tables and encryption with a cyrptographic hash.

A hasing function could be really simple, to read off a hash table, or it can be more complex.

For example:

Hash	content
0	Success
1	Failure

If we want to represent the status of a program running it has two possible outcomes: success or failure. We can use the following hash table and a function that takes in the content and returns the corresponding hash. Then we could pass around the 0 and 1 as a single bit of information that corresponds to the outcomes.

This lookup table hash works here.

In a more complex scenario, imagine trying to hash all of the new terms you learn in class. A table would be hard for this, because until you have seen them all, you do not know how many there will be. A more effective way to hash this, is to derive a hashing function that is a general strategy.

A cyrptographic hash is additionally:

unique
not reversible
similar inputs hash to very different values so they appear uncorrelated

Hashes can then be used for a lot of purposes:

message integrity (when sending a message, the unhashed message and its hash are both sent; the message is real if the sent message can be hashed to produce the same has)
password verification (password selected by the user is hashed and the hash is stored; when attempting to login, the input is hashed and the hashes are compared)
file or data identifier (eg in git)

12.3. Hashing in passwords#

Passowrds can be encrypted and the encrypted information is stored, then when you submit a candidate password it can compare the hash of the submitted password to the hash that was stored. Since the hashing function is nonreversible, they cannot see the password.

An attacker who gets one of those databases, cannot actually read the passwords, but they could build a lookup table. For example, “password” is a bad password because it has been hashed in basically every algorithm and then the value of it can be reversed. Choosing an uncommon password makes it less likely that your password exists in a lookup table.

echo "password" | git hash-object --stdin

f3097ab13082b70f67202aab7dd9d1b35b7ceac2

12.4. Hashing in Git#

In git we hash both the content directly to store it in the database (.git) directory and the commit information.

Recall, when we were working in our toy repo we created an empty repository and then added content directly, we all got the same hash, but when we used git commit our commits had different hashes because we have different names and made the commits at different seconds. We also saw that two entries were created in the .git directory for the commit.

Git as originally designed to use SHA-1. SHA-1 is weak. Git switched to hardened HSA-1 in response to a collision. Learn more about the SHA-1 collision attach

In that case it adjusts the SHA-1 computation to result in a safe hash. This means that it will compute the regular SHA-1 hash for files without a collision attack, but produce a special hash for files with a collision attack, where both files will have a different unpredictable hash. from.

they will change again soon

We can use the git hashing algorithm without writing to the repo too:

echo "it's almost break" | git hash-object --stdin

d49aa364a349587fc438e7a738d58b8eb06b040f

Then we get the hash back. Let’s change just one character

echo "it's almost brak" | git hash-object --stdin

671ece673e365c943997c861be56a48977ceff77

and we see that it changes a lot.

echo "it's almost braek" | git hash-object --stdin

185150671674540c2229dac8bda22ecba7bbc3f8

and again.

git uses the SHA hash primarily for uniuqeness, not privacy

It does provide some security assurances, because we can check the content against the hash to make sure it is what it matches.

This is a Secure Hashing Algorithm that is derived from cryptography. Because it is secure, no set of mathematical options can directly decrypt an SHA-1 hash. It is designed so that any possible content that we put in it returns a unique key. It uses a combination of bit level operations on the content to produce the unique values.

The SHA-1 Algorithm hashes content into a fixed length of 160 bits.

This means it can produce \(2^160\) different hashes. Which makes the probability of a collision very low.

The number of randomly hashed objects needed to ensure a 50% probability of a single collision is about \(2^{80}\) (the formula for determining collision probability is p = (n(n-1)/2) * (1/2^160)). \(2^{80}\) is 1.2 x 1024 or 1 million billion billion. That’s 1,200 times the number of grains of sand on the earth.

– A SHORT NOTE ABOUT SHA-1 in the Git Documentation

12.4.1. Workign with git hashes#

Mostly, a shorter version of the commit is sufficient to be unique, so we can use those to refer to commits by just a few characters:

minimum 4
must be unique

cd ../github-in-class-brownsarahm-1/
git log

commit 4fa9114632f26ec590eec7e91712596085d7c442 (HEAD -> main, origin/main, origin/HEAD)
Author: Sarah M Brown <brownsarahm@uri.edu>
Date:   Tue Feb 14 12:48:11 2023 -0500

    bug fix

commit 6a2e1cc65204d0c05acc477e7d5e704f989a68dc
Author: Sarah M Brown <brownsarahm@uri.edu>
Date:   Thu Feb 9 13:26:28 2023 -0500

    add jacket

commit 8e2fe11b4ed5ac39b22f680a4c627fd88a6628fd
Author: Sarah Brown <brownsarahm@uri.edu>
Date:   Thu Feb 9 13:22:43 2023 -0500

    Update about.md

For most project 7 characters is enough and by default, git will give you 7 digits if you use --abbrev-commit and git will automatically use more if needed.

git log --abbrev-commit --pretty=oneline

4fa9114 (HEAD -> main, origin/main, origin/HEAD) bug fix
6a2e1cc add jacket
8e2fe11 Update about.md
bcdc409 Merge pull request #6 from introcompsys/organization
56f29ca (origin/organization) Merge branch 'main' into organization
cffcf05 add major
f2844b2 update title
812245d (organization) start organizing
ef45e77 start organizing
4b89dff (my_branch_checkedoutb, my_branch) Merge pull request #5 from introcompsys/fill-in-about
c28b4ad (origin/fill-in-about, fill-in-about) add my name
0169e39 Merge pull request #4 from introcompsys/2-create-an-about-file
57de0cd (origin/2-create-an-about-file, 2-create-an-about-file) create empty about
3f54148 closes #1
4db10e5 Initial commit

12.5. What is a Number ?#

a mathematical object used to count, measure and label

12.6. What is a number system?#

While numbers represent quantities that conceptually, exist all over, the numbers themselves are a cultural artifact. For example, we all have a way to represent a single item, but that can look very different.

for example I could express the value of a single item in different ways:

1
I

In modern, western cultures our is called the hindu-arabic system, it consists of a set of numerals: 0,1,2,3,4,5,6,7,8,9 and uses a place based system with base 10.

invented by Hindu mathematicians in India 600 or earlier
called “Arabic” numerals in the West because Arab merchants introduced them to Europeans
slow adoption

We use a place based system. That means that the position or place of the symbol changes its meaning. So 1, 10, and 100 are all different values. This system is also a decimal system, or base 10. So we refer to the places and the ones (\(10^0\)), the tens (\(10^1\)), the hundreds(\(10^2\)), etc for all powers of 10.

Number systems can use different characters, use different strategies for representing larger quantities, or both.

12.6.1. Roman Numerals#

is both different characters and not place based.

There are symbols for specific values: 1=I, V=5, X=10, L =50, C = 100, D=500, M = 1000.

Not all systems are place based, for example Roman numerals. In this system the subsequent symbols are either added or subtracted, with no (nonidentity) multipliers based on position. Instead if the symbol to right is the same or smaller, add the two together, if the left symbol is smaller, subtract it from the one on the right.

Then

III = 1+1+1 = 3
IV = -1 + 5 = 4
VI = 5+1 = 6
XLIX = -10 + 50 -1 +10 = 49.

This feel hard because it is unfamiliar

12.6.2. Decimal#

To represent larger numbers than we have digits on we have a base (10) and then.

\[10 = 10*1 + 1*0\]

\[22 = 10*2 + 1*2 \]

we have the ones (\(10^0) place, tens (\)10^1\() place, hundreds (\)10^2) place etc.

12.6.3. Binary#

Binary is any base two system, and it can be represented using any different characters.

Binary number systems have origins in ancient cultures:

Egypt (fractions) 1200 BC
China 9th century BC
India 2nd century BC

In computer science we use binary because mechanical computers began using relays (open/closed) to implement logical (boolean) operations and then digital computers use on and off in their circuits.

We represent binary using the same hindu-arabic symbols that we use for other numbers, but only the 0 and 1(the first two). We also keep it as a place-based number system so the places are the ones(\(2^0\)), twos (\(2^1\)), fours (\(2^2\)), eights (\(2^3\)), etc for all powers of 2.

so in binary, the number of characters in the word binary is 110.

\[ 10 => 2*1 + 1*0 = 2\]

so this 10 in binary is 2 in decimal

\[ 1001 => 8*1 + 4*0 + 2*0 + 1*1 = 9\]

12.6.4. Octal#

Is base 8. This too has history in other cultures, not only in computer science. It is rooted in cultures that counted using the spaces between fingers instead of counting using fingers.

use by native americans from present day CA

and

Pamean languages in Mexico

\[ 10 = > 8*1 + 1*0 = 8\]

so 10 in octal is 8 in decimal

\[ 401 => 64*4 + 8*0 + 1*1 = 257\]

This numbering system was popular in 6 bit and 12 bit computers, but is has origins before that. Native Americans using the Yuki Language (based in what is now California)used an octal system because they count using the spaces between fingers and speakers of the Pamean languages in Mexico count on knuckles in a closed fist. Europeans debated using decimal vs octal in the 1600-1800s for various reasons because 8 is better for math mostly. It is also found in Chinese texts dating to 1000BC.

As in binary we use hindu-arabic symbols, 0,1,2,3,4,5,6,7 (the first eight). Then nine is 11.

In computer science we use octal a lot because it reduces every 3 bits of a number in binary to a single character. So for a large number, in binary say 101110001100 we can change to 5614 which is easier to read, for a person.

12.6.5. Hexadecimal#

base 16, commonin CS because its 4 bits. we use 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F.

This is how the git hash is 160 bits, or 20 bytes (one byte is 8 bits) but we represent it as 40 characters. 160/4=40.

cd ../test

git log

commit 188a75ef66b6a85be0ab68d8575ec27808881dfc (HEAD -> test)
Author: Sarah M Brown <brownsarahm@uri.edu>
Date:   Thu Feb 23 13:32:19 2023 -0500

    first commit

git status

On branch test
nothing to commit, working tree clean

cat .git/HEAD 

ref: refs/heads/test

git checkout main

Switched to branch 'main'

cat .git/HEAD 

ref: refs/heads/main

cat .git/refs/heads/main 

90f8d145f3b264e99832b47a662ed5d50b687e7a

## Review today's class

```{include} ../_review/2023-03-07.md

12.7. Prepare for Next Class#

Make sure you can run python code from bash and that you hve gh CLI installed. You will need to be able to run gh and python in the same terminal. This should happen for free on not-Windows or WSL. On Windows, check the GitBash settings.

12.8. More Practice#

Learn more about how git is working on changing from SHA-1 to SHA-256 and answer the transition questions below) gittransition.md
find 2 more real world examples of using other number systems (either different bases or different symbols and bases) that are current. Describe them in numbers.md

## transition questions
Why make the switch?
What impact will the switch have on how git works?
Which developers will have the most work to do because of the switch?