Why is the object file not human readable?
Contents
16. Why is the object file not human readable?#
The object file displays as random weird characters because it is written to disk in a different format than our terminal reads it in. It is the specific instructions (from the assembly) and memory addresses written to a file in binary. Our terminal reads it one byte (8bits) at a time and interprets those as characters.
Today, we will see how characters are represented as integers, then binary, and read and write files in binary of strings we know to see how it what happens and mimic something like how that object file happened by writing as binary and then reading it as characters.
16.1. Prelude: REPL#
One way we can use many interpreted languages, including Python is to treat the interpreter like an application and then interact with it in the terminal.
This is called the Read, Execute, Print Loop or REPL.
We can execute simple expressions like mathemeatical expressions:
4+5
and it will print the result
9
We can create variables
name = 'sarah'
but assignment does not return anything, so nothing is printed.
We can see the value of a variable, by typing its name
name
and then it prints the value
'sarah'
16.2. Characters to Integers#
Python provides a builtin function ord
to find the Unicode key for that character.
[ord(char) for char in name]
We get the integer representtion for each one.
[115, 97, 114, 97, 104]
Note that it’s important to try multiple things if you aren’t sure the format you are getting back or to check the official documents, as Denno found by testing this with his name in lowercase.
[ord(char) for char in 'denno']
Coincidentally, the letters in his name all fall into the hundreds and only require 0 and 1 in decimal, so it looks like they could be 3 bit binary numbers.
[100, 101, 110, 110, 111]
Python also allows us to change content into a byte array:
bytearray(34)
It does not display in the 0 and 1 like a string that we migth expect, but it does change the type.
bytearray(b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00')
We can then write this to a file, by getting the actual byes of each item. First, we will save our list in a variable.
We will use the Python with
keyword to open the file so that we do not to need to separately open and close the file.
in the open
function, the first parameter is the file name and the second is the mode. In this case, we want to write (‘w’) in binary (‘b’). f
is the filestream.
name_ints = [ord(char) for char in name]
with open("name.txt",'wb') as f:
... f.write(bytes(bytearray(name_ints)))
...
it returns 5 in this case, because it wrote out 5 bytes, one per character.
5
If we exit python we can look at the file and see how it is interpreted.
exit()
Remember we wrote, as binary, the numbers, 115, 97, 114, 97, 104
to the file
cat name.txt
but it is interpretted as the string by the terminal.
sarah
python
Python 3.8.6 (v3.8.6:db455296be, Sep 23 2020, 13:31:39)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Back in Python, we can read the file as binary
with open("name.txt",'rb') as f:
print(f.read())
This is binary, but the python interpreter also prints it as a string, the b
indicates the object is actually binary type.
b'sarah'
If we read it not as binary, it returns as the string
with open("name.txt",'r') as f:
print(f.read())
sarah
We can also confirm the type
with open("name.txt",'rb') as f:
content = f.read()
type(content)
<class 'bytes'>
The with keyword is alternative to:
f = open("name.txt",'rb')
f.read()
f.close()
Practice exercise
msg = [68, 114, 46, 32, 66, 114, 111, 119, 110,10]
bytes(bytearray(msg))
b'Dr. Brown\n'
16.3. Bit level Operations#
We can operate on integers bit by bit.
If we shift 3 one to the right we transform from 11
to 1
, we can use the >>
operator to do this.
3 >> 1
1
We can shift it the other way too: 11
to 110
so we get from 3 to 6
3 << 1
6
We can also use logical operators
1 & 0
0
They operate bit by bit so 11
and 01
ends at 01
3 & 1
1
The inputs to these operators must be integers.
type(3)
<class 'int'>
3.4 & 4.5
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for &: 'float' and 'float'
And Xor
3 ^ 3
0
and or
3 | 1
3
16.4. Not and Negative numbers in binary#
We can also invert the bits, which flips all of bits.
~3
the output is
-4
We started with 00000011 When we flip all of the bits, we get 11111100. To see how this translates to decimal, we then have to think about how negative numbers are calulated.
There are two common ways, but the one in practice here is called two’s complement. The two’s complement is calculated by adding one to the one’s complement, so we will calculate that first.
One’s complement is to invert the bits, so in one’s complement 11111100
is -3.
If we add one to that, we get the two’s compliment, 11111101
so in two’s compliment, that is -3.
If we calculate the two’s compliment of 4:
in binary 4 is 00000100
, then we do the on’es complement, 11111011
and then add one, 11111100
, which is the what we got above by flipping the bits of 3.
note remember the first bit is the sign. 1 is for negative numbers and 0 is for positive. We used 8bit above to save visual space, but if represented as 16, 32, or 64 bit integer it would be the same.
We can also check the minimum number of bits for a number
(54).bit_length()
6
We can also get the binary representation for an integer with the builtin bin
bin(3)
'0b11'
This case it write the sign before the letter b, instead of using two’s complement or ones complement.
bin(-5)
'-0b101'
we can also write numbers in that format and get the integer back.
0b11
3
and we can do the bitwise operations on binary numbers directly as well.
0b111 & 0b0111
7
0b111 & 0b0100
4
bin(0b111 & 0b0100)
'0b100'
bin(0b1111 & 0b0100)
'0b100'
bin(0b1111 ^ 0b0100)
'0b1011'
bin(0b1111 - 0b0100)
'0b1011'
exit()
16.5. Review today’s class#
Add
bitwise.md
to your kwl and write the bitwise operations required for the following transformations:4 -> 128 12493 - > -12494 127 -> 15 7 -> 56 4 -> -5
For the following figure out the bitwise operator:
45 (_) 37 = 37 45 (_) 37 = 45 3 (_) 5 = 7 6 (_) 8 = 0 10 (_) 5 = 15
Create
readingbytes.md
and answer the following:1. if a file had the following binary contents, what would it display in the terminal? Describe how you can figure this out manually and check it using C or Python. '01110011011110010111001101110100011001010110110101110011' 1. What is the contents of the `sample.bn` if `cat sample.bn` is: ` ¢¶"*`
Read about integer overflow and in
overflow.md
describe what it is, use an example assuming an 8 bit system.
16.6. Prepare for Next Class#
In
fractionalbinary.md
use 8 bits to represent the following points where 4 bits are >1 and the other 4 bits are 1/2, 1/4, 1/8, 1/16th:
3.75
7.5
11.625
5.1875
Add to your file some note about the limitations of representing non integer values this way. How much would using more bits help with, what limitations are not resolved y adding more bits.
16.7. More Practice#
(priority) Add to
overflow.md
how integer overflow is handled in Python, C, Javascript, and one other language of your choice.Add to
readingbytes.md
and exmaple of how machine code that was a 3 bit instruction followed by an 8 bit address might render.
16.8. Questions#
16.8.1. How does byte(bytearray() work?#
It casts it to a bytearray then a bytes object. see the docs
16.8.2. What are the ‘&’ and ‘|’ operators used for?#
They’re occasionally used for calculating things if highly compressed, otherwise they are used
16.8.3. What do you mean by “ones compliment”?#
This is ont type of representation for negative numbers in binary. In ones complement, we flip all of the bits. The first bit is always the indicator for a negative or positive number. So, if we want to represent negative thirteen in eight bits we would start with thirteen: 00001101 then flip all fo the bits to get 11110010. If we were to read this number, we would know that this is a negative number because the first digit is a 1, so to figure out the value, we would flip all of the bits back and get 00001101.
In any notation where the first bit is the sign, the highest number we can represent is reduce in favor of representing both positive and negative numbers. That is, if we drop to four bits instead of being able to have 1111 be the highest number: 15 and 0000 be the lowest number: 0. The highest number would be 0111 = 7, and the lowest will be 1000 = -7.
Two’s complement flips the bits and adds one. This is more commonly used because we can represent a large range of numbers.
Tip
Comparing these is a good deeper Exploration