What happens when I build code in C?
Contents
15. What happens when I build code in C?#
15.1. What is buiding code?#
Building is transforming code from the input format to the final format.
This can mean different things in different contexts. For example:
the course website is built from markdown files in to html output
a C program is built from source code to executables
We sometimes say that compiling takes code from source to executable, but this process is actually multiple stages and compiling is one of those steps.
We will focus on what has to happen more than how it all happens.
CSC301, 402, 501, 502 go into greater detail on how languages work.
Our goal is to:
(where applicable) give you a preview
get enough understanding of what happens to know where to look when debugging
15.2. An overview#
15.3. Setup for today#
mkdir compilec
cd compilec/
ls
we have an empty folder. This will be importatnt.
nano hello.c
And we’ll paste in the following
#include <stdio.h>
void main () {
printf("Hello world\n");
}
cat hello.c
#include <stdio.h>
void main () {
printf("Hello world\n");
}
ls
hello.c
we have a single file
15.4. Preprocessing with gcc#
First we handle the preprocessing which pulls in headers that are included. We will use the compiler gcc
gcc -E hello.c -o hello.i
-E
stops after preprocessing-o
makes it write the .i file and passes the file name for it
If it succeeds, we see no output, but we can check the folder
ls
now we have a new file
hello.c hello.i
This file is much longer than the one we started with
cat hello.i |wc -l
542
cat hello.c |wc -l
5
So we will look at only a few rows here.
cat hello.i | head -n 10
# 1 "hello.c"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 366 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "hello.c" 2
# 1 "/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/stdio.h" 1 3 4
# 64 "/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/stdio.h" 3 4
# 1 "/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/_stdio.h" 1 3 4
This gives us a version with the header file’s contents literally pasted in to replace the original #include
statement
and at the bottom
cat hello.i | tail -n 10
is our original program
extern int __vsnprintf_chk (char * restrict, size_t, int, size_t,
const char * restrict, va_list);
# 408 "/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/stdio.h" 2 3 4
# 2 "hello.c" 2
void main () {
printf("Hello world\n");
}
At the bottom of the file, we see the original code with an extra bit of information that helps the compiler write better error messages, by saying where contents came from.
15.5. Compiling#
Next we take our preprocessed file and compile it to get assembly code.
gcc -S hello.i
-S
tells it to produce assembly
hello.c:3:1: warning: return type of 'main' is not 'int' [-Wmain-return-type]
void main () {
^
hello.c:3:1: note: change return type to 'int'
void main () {
^~~~
int
1 warning generated.
We got a warning, but that is okay
ls
and then we have a new file as well:
hello.c hello.i hello.s
The assembly code is also readable
cat hello.s
There are many more steps and they are lower level programs, but it is text stored in the file.
.section __TEXT,__text,regular,pure_instructions
.build_version macos, 10, 15, 6 sdk_version 10, 15, 6
.globl _main ## -- Begin function main
.p2align 4, 0x90
_main: ## @main
.cfi_startproc
## %bb.0:
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %rsp, %rbp
.cfi_def_cfa_register %rbp
leaq L_.str(%rip), %rdi
movb $0, %al
callq _printf
popq %rbp
retq
.cfi_endproc
## -- End function
.section __TEXT,__cstring,cstring_literals
L_.str: ## @.str
.asciz "Hello world\n"
.subsections_via_symbols
This is like the assembly code we saw in the hardware simulator, but this is the fully expressive assembly language that our computers use, not the toy one for the simulator.
15.6. Assembling#
Assembling is to take the assembly code and get object code. Assembly is relatively broad and there are families of assembly code, it is also still written for humans to understand it readily. It’s more complex than source code because it is closer to the hardware. The object code however, is specific instructions to your machine and not human readable.
gcc -c hello.s -o hello.o
-c
tells it to stop at the object file-o
again gives it the name of the file to write
ls
hello.c hello.i hello.o hello.s
now we see a new file, the .o
let’s look at it
cat hello.o
This is not machine readable, though:
<__compact_unwind__LD( P?__eh_frame__TEXTH@p
h2
??
PUH??H?= ??]?Hello world
zRx
-_main_printf```
MacOS tried to help a little but it’s still not very readable.
15.7. Linking#
Now we can link it all together; in this program there are not a lot of other depdencies, but this fills in anything from libraries and outputs an executble
gcc -o hello hello.o -lm
-o
flag specifies the name for output-lm
tells it to link from the .o file.
again we can look at the directory
ls
we have a new executable file
hello hello.c hello.i hello.o hello.s
We can see that the file as execute permissions:
ls -la
total 176
drwxr-xr-x 7 brownsarahm staff 224 Oct 31 16:57 .
drwxr-xr-x 11 brownsarahm staff 352 Oct 31 16:42 ..
-rwxr-xr-x 1 brownsarahm staff 49424 Oct 31 16:57 hello
-rw-r--r-- 1 brownsarahm staff 63 Oct 31 16:44 hello.c
-rw-r--r-- 1 brownsarahm staff 22932 Oct 31 16:45 hello.i
-rw-r--r-- 1 brownsarahm staff 760 Oct 31 16:52 hello.o
-rw-r--r-- 1 brownsarahm staff 647 Oct 31 16:49 hello.s
Finally we can run our program
./hello
Hello world
15.8. Putting it all together#
We can also do all of it at once, to see how it’s different let’s clean up the directory:
rm hello.i hello.s hello.o
ls
hello.c
and now we can tell it to compile and link
gcc -Wall -g -o hello hello.c -lm
hello.c:3:1: warning: return type of 'main' is not 'int' [-Wmain-return-type]
void main () {
^
hello.c:3:1: note: change return type to 'int'
void main () {
^~~~
int
1 warning generated.
ls
hello hello.c hello.dSYM
we have the file again as expected.
15.9. Working with multiple files#
nano main.c
/* Used to illustrate separate compilation.
Created: Joe Zachary, October 22, 1992
Modified:
*/
#include <stdio.h>
void main () {
int n;
printf("Please enter a small positive integer: ");
scanf("%d", &n);
printf("The sum of the first n integers is %d\n", sum(n));
printf("The product of the first n integers is %d\n", product(n));
}
nano help.c
/* Used to illustrate separate compilation
Created: Joe Zachary, October 22, 1992
Modified:
*/
/* Requires that "n" be positive. Returns the sum of the
first "n" integers. */
int sum (int n) {
int i;
int total = 0;
for (i = 1; i <= n; i++)
total += i;
return(total);
}
/* Requires that "n" be positive. Returns the product of the
first "n" integers. */
int product (int n) {
int i;
int total = 1;
for (i = 1; i <= n; i++)
total *= i;
return(total);
}
ls
hello hello.c hello.dSYM help.c main.c
First we can make the two objects:
gcc -Wall -g -c main.c
but here we get an error:
main.c:10:1: warning: return type of 'main' is not 'int' [-Wmain-return-type]
void main () {
^
main.c:10:1: note: change return type to 'int'
void main () {
^~~~
int
main.c:14:52: error: implicit declaration of function 'sum' is invalid in C99
[-Werror,-Wimplicit-function-declaration]
printf("The sum of the first n integers is %d\n", sum(n));
^
main.c:15:56: error: implicit declaration of function 'product' is invalid in C99
[-Werror,-Wimplicit-function-declaration]
printf("The product of the first n integers is %d\n", product(n));
^
1 warning and 2 errors generated.
We can get around this, by telling main about the functions by adding
int sum(int n);
int product (int n);
to the main.c
nano main.c
gcc -Wall -g -c main.c
main.c:13:1: warning: return type of 'main' is not 'int' [-Wmain-return-type]
void main () {
^
main.c:13:1: note: change return type to 'int'
void main () {
^~~~
int
1 warning generated.
and then the helper code
gcc -Wall -g -c help.c
ls
hello hello.dSYM help.o main.o
hello.c help.c main.c
Tip
One reason we split code is to make it readable, but another reason is what we just did. We can compile each file separately, when your code is large and compiling takes a long time, splitting it will mean you only have to recompile the file(s) you have recently changed and relink, instead of recompiling everything.
and finally we link them.
gcc -o demo main.o help.o -lm
demo hello.c help.c main.c
hello hello.dSYM help.o main.o
./demo
Please enter a small positive integer: 4
The sum of the first n integers is 10
The product of the first n integers is 24
./demo
Please enter a small positive integer: 7
The sum of the first n integers is 28
The product of the first n integers is 5040
cat main.o
????????
__text__TEXTg?t
?__cstring__TEXTg}__debug_str__DWARF???__debug_abbrev__DWARF?GB__debug_info__DWARF?^?__apple_names__DWARF?<?__apple_objc__DWARF{$#__apple_namespac__DWARF?$G__apple_types__DWARF?Gk__compact_unwind__LD ?__eh_frame__TEXT0@?
h__debug_line__DWARFp2
4 $
PUH??H??H?=X??H?=rH?u??E????}??E??H?=U?ư??}??E??H?=a?ư?H??]?Please enter a small positive integer: %dThe sum of the first n integers is %d
The product of the first n integers is %d
Apple clang version 12.0.0 (clang-1200.0.32.2)main.c/Library/Developer/CommandLineTools/SDKs/MacOSX.sdkMacOSX.sdk/Users/brownsarahm/Documents/inclass/systems/compilecmainnint%?|?:
;
?4:
;
I$>
Z
/6?|?V?HSAH
j?|,?2HSAH
????HSAH
????HSAH
0??
4?V$gzRx
main.c ?$????????gA?C
??K4fX@?8fX?]-TM-B-92-'--
3&+
_product_main_sum_printf_scanf```
15.10. What does the -o
option do?#
we can remove it to see
gcc main.o help.o -lm
ls
in this case it makes up a name for the executable
a.out hello hello.dSYM help.o main.o
demo hello.c help.c main.c
ls -ls
total 352
104 -rwxr-xr-x 1 brownsarahm staff 50072 Oct 31 17:28 a.out
104 -rwxr-xr-x 1 brownsarahm staff 50072 Oct 31 17:23 demo
104 -rwxr-xr-x 1 brownsarahm staff 49688 Oct 31 17:05 hello
8 -rw-r--r-- 1 brownsarahm staff 63 Oct 31 16:44 hello.c
0 drwxr-xr-x 3 brownsarahm staff 96 Oct 31 17:05 hello.dSYM
8 -rw-r--r-- 1 brownsarahm staff 476 Oct 31 17:10 help.c
8 -rw-r--r-- 1 brownsarahm staff 2364 Oct 31 17:18 help.o
8 -rw-r--r-- 1 brownsarahm staff 381 Oct 31 17:17 main.c
8 -rw-r--r-- 1 brownsarahm staff 2392 Oct 31 17:18 main.o
./a.out
Please enter a small positive integer: 9
The sum of the first n integers is 45
The product of the first n integers is 362880
so it still works without specifying a name for the executable, but it’s a lot neater to give it a meaningful name.
15.11. Is the exectuable file binary?#
Yes. We’ll discuss this more carefully next class. It is binary, but the terminal app splits the binary every 8 bits and converts it to a character. Since that’s now how the file was written, we get random looking characters.
cat demo
????X? H__PAGEZERO?__TEXT@@__text__TEXT>?>?__stubs__TEXT?>
??__stub_helper__TEXT
?$
??__cstring__TEXT0?}0?__unwind_info__TEXT??H???__DATA_CONST@@@@__got__DATA_CONST@?__DATA?@?@__la_symbol_ptr__DATA??__data__DATA?H__LINKEDIT?@??"?? ? @?H??h?0
PP?
/usr/lib/dyld?~U`<???)#?>?2
a*(?>
8d/usr/lib/libSystem.B.dylib&?)??UH??H??H?=???H?=+H?u??E?????}??E??:H?=?ư??}??E??_H?=?ư?H??]Ð????????UH??}??E??E??E?;E???E?E?E?E????E???????E?]?UH??}??E??E??E?;E???E??E?E?E????E???????E?]??%?@?%?@L??@AS?%??h?????h?????Please enter a small positive integer: %dThe sum of the first n integers is %d
The product of the first n integers is %d
>44?>4
?&?#R@dyld_stub_binderQr?s@_printf?@_scanf?__mh_execute_header/main3sum8product=?|?}?}?|p@?Yd?d?f?;`c.>?$>$gNgdYd?d?f?;`c.?>$?>$@N@.?>#$?>$>N>d>?>%?>*29__mh_execute_header_main_product_sum_printf_scanfdyld_stub_binder__dyld_private/Users/brownsarahm/Documents/inclass/systems/compilec/main.c/Users/brownsarahm/Documents/inclass/systems/compilec/main.o_mainhelp.c/Users/brownsarahm/Documents/inclass/systems/compilec/help.o_sum_product```
15.12. Review today’s class#
Update more rows on your KWL Chart based on what we did today.
Practice using gcc. Repeat steps we did in class, change the order of parameters; try skipping steps to produce errors, etc. Then in
gcctips.md
summarize what you learned as a list of tips and reminders on what the parameters do/why/when you would need them (or not). (to reinforce what we learned)Contribute to your group repo and review a team mate’s PR>
15.13. Prepare for Next Class#
Create
operators.md
and make some notes about what you know about operators. What kinds of operators are you familiar with? Which have you seen in programming? math?
15.14. More Practice#
(priority) Write two short programs that do the same thing in different ways and compile them both to assembly (eg using a for vs while loop to sum numbers up to a number). Check the assembly to see if they produce the same thing or if it’s different. Save your code (in code blocks) and notes about your findings in
assemblycompare.md
15.15. Questions After Class#
15.15.1. Is there a benefit to linking object files like we did today rather than just compiling a larger file with all of the functions declared and defined?#
Linking has to happen when you use libraries, we did this process to illustrate what happens. Also it’s much easier to read a larger file and can cut compilation time while you are working if you only have to recompile the small section you have edited recently.
15.15.2. is C the closest low level program that’s common now since no one uses assembly?#
15.15.3. What step of the compilation process is different from OS to OS?#
A lot of the differences are actually at the hardware level, even within an OS, but the object code and executales are system-or system type specific.
15.15.4. what can you build with assembly language?#
Anything!
15.15.5. Where can I read documentation on parameters and meanings?#
15.15.6. Are the weird symbols from the machine code mean anything to us? or is the computer trying to make sense of something with alphabetical significance?#
This is machine code, it is binary that is designed for your hardware to interpret as instructions.
When we display it to the terminal, your computer tries to interpret that binary as text, so it ends up as weird characters mostly.
15.15.7. I thought the final step was to convert code into binary? Is that something that we can’t look at?#
The object code and the exetuable are both binary
15.15.8. Just confirming? To get an executable a compiler basically takes in the high level code to assembly which is then outputted to object code that is then linked doing “gcc -o -lm” to create it?#
Yes!!
15.15.9. One thing I’m curious about is if it’s possible to distinguish between compile errors and linking errors at the very least, more on if anything can be done about them when they pop up.#
Delete the main
and help
object files and remove the declarations from the main.c
. Then try gcc -Wall -g main.c
. You’ll get a linker error, why?