15. What happens when I build code in C?#

15.1. What is buiding code?#

Building is transforming code from the input format to the final format.

This can mean different things in different contexts. For example:

  • the course website is built from markdown files in to html output

  • a C program is built from source code to executables

We sometimes say that compiling takes code from source to executable, but this process is actually multiple stages and compiling is one of those steps.

We will focus on what has to happen more than how it all happens.

CSC301, 402, 501, 502 go into greater detail on how languages work.

Our goal is to:

  • (where applicable) give you a preview

  • get enough understanding of what happens to know where to look when debugging

15.2. An overview#

flowchart of prepreproces, compile, assemble, link

source

15.3. Setup for today#

mkdir compilec
cd compilec/
ls

we have an empty folder. This will be importatnt.

nano hello.c

And we’ll paste in the following

#include <stdio.h>

void main () {
 printf("Hello world\n");
}
cat hello.c
#include <stdio.h>

void main () {
 printf("Hello world\n");
}
ls
hello.c

we have a single file

15.4. Preprocessing with gcc#

First we handle the preprocessing which pulls in headers that are included. We will use the compiler gcc

gcc -E hello.c -o hello.i
  • -E stops after preprocessing

  • -o makes it write the .i file and passes the file name for it

If it succeeds, we see no output, but we can check the folder

ls

now we have a new file

hello.c	hello.i

This file is much longer than the one we started with

cat hello.i |wc -l
     542
cat hello.c |wc -l
       5

So we will look at only a few rows here.

cat hello.i | head -n 10
# 1 "hello.c"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 366 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "hello.c" 2
# 1 "/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/stdio.h" 1 3 4
# 64 "/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/stdio.h" 3 4
# 1 "/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/_stdio.h" 1 3 4

This gives us a version with the header file’s contents literally pasted in to replace the original #include statement

and at the bottom

cat hello.i | tail -n 10

is our original program

extern int __vsnprintf_chk (char * restrict, size_t, int, size_t,
       const char * restrict, va_list);
# 408 "/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/stdio.h" 2 3 4
# 2 "hello.c" 2

void main () {
 printf("Hello world\n");
}

At the bottom of the file, we see the original code with an extra bit of information that helps the compiler write better error messages, by saying where contents came from.

15.5. Compiling#

Next we take our preprocessed file and compile it to get assembly code.

gcc -S hello.i
  • -S tells it to produce assembly

hello.c:3:1: warning: return type of 'main' is not 'int' [-Wmain-return-type]
void main () {
^
hello.c:3:1: note: change return type to 'int'
void main () {
^~~~
int
1 warning generated.

We got a warning, but that is okay

ls

and then we have a new file as well:

hello.c	hello.i	hello.s

The assembly code is also readable

cat hello.s

There are many more steps and they are lower level programs, but it is text stored in the file.

	.section	__TEXT,__text,regular,pure_instructions
	.build_version macos, 10, 15, 6	sdk_version 10, 15, 6
	.globl	_main                   ## -- Begin function main
	.p2align	4, 0x90
_main:                                  ## @main
	.cfi_startproc
## %bb.0:
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset %rbp, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register %rbp
	leaq	L_.str(%rip), %rdi
	movb	$0, %al
	callq	_printf
	popq	%rbp
	retq
	.cfi_endproc
                                        ## -- End function
	.section	__TEXT,__cstring,cstring_literals
L_.str:                                 ## @.str
	.asciz	"Hello world\n"

.subsections_via_symbols

This is like the assembly code we saw in the hardware simulator, but this is the fully expressive assembly language that our computers use, not the toy one for the simulator.

15.6. Assembling#

Assembling is to take the assembly code and get object code. Assembly is relatively broad and there are families of assembly code, it is also still written for humans to understand it readily. It’s more complex than source code because it is closer to the hardware. The object code however, is specific instructions to your machine and not human readable.

gcc -c hello.s -o hello.o
  • -c tells it to stop at the object file

  • -o again gives it the name of the file to write

ls
hello.c	hello.i	hello.o	hello.s

now we see a new file, the .o

let’s look at it

cat hello.o

This is not machine readable, though:

<__compact_unwind__LD( P?__eh_frame__TEXTH@p
                                            h2

??
  PUH??H?=	??]?Hello world
zRx
-_main_printf```

MacOS tried to help a little but it’s still not very readable.

15.7. Linking#

Now we can link it all together; in this program there are not a lot of other depdencies, but this fills in anything from libraries and outputs an executble

gcc -o hello hello.o -lm
  • -o flag specifies the name for output

  • -lm tells it to link from the .o file.

again we can look at the directory

ls

we have a new executable file

hello	hello.c	hello.i	hello.o	hello.s

We can see that the file as execute permissions:

ls -la
total 176
drwxr-xr-x   7 brownsarahm  staff    224 Oct 31 16:57 .
drwxr-xr-x  11 brownsarahm  staff    352 Oct 31 16:42 ..
-rwxr-xr-x   1 brownsarahm  staff  49424 Oct 31 16:57 hello
-rw-r--r--   1 brownsarahm  staff     63 Oct 31 16:44 hello.c
-rw-r--r--   1 brownsarahm  staff  22932 Oct 31 16:45 hello.i
-rw-r--r--   1 brownsarahm  staff    760 Oct 31 16:52 hello.o
-rw-r--r--   1 brownsarahm  staff    647 Oct 31 16:49 hello.s

Finally we can run our program

./hello
Hello world

15.8. Putting it all together#

We can also do all of it at once, to see how it’s different let’s clean up the directory:

rm hello.i hello.s hello.o
ls
hello.c

and now we can tell it to compile and link

gcc -Wall -g -o hello hello.c -lm
hello.c:3:1: warning: return type of 'main' is not 'int' [-Wmain-return-type]
void main () {
^
hello.c:3:1: note: change return type to 'int'
void main () {
^~~~
int
1 warning generated.
ls
hello		hello.c		hello.dSYM

we have the file again as expected.

15.9. Working with multiple files#

nano main.c
/* Used to illustrate separate compilation.

Created: Joe Zachary, October 22, 1992
Modified:

*/

#include <stdio.h>

void main () {
 int n;
 printf("Please enter a small positive integer: ");
 scanf("%d", &n);
 printf("The sum of the first n integers is %d\n", sum(n));
 printf("The product of the first n integers is %d\n", product(n));
}
nano help.c
/* Used to illustrate separate compilation

Created: Joe Zachary, October 22, 1992
Modified:

*/


/* Requires that "n" be positive. Returns the sum of the
  first "n" integers. */

int sum (int n) {
 int i;
 int total = 0;
 for (i = 1; i <= n; i++)
  total += i;
 return(total);
}


/* Requires that "n" be positive. Returns the product of the
  first "n" integers. */

int product (int n) {
 int i;
 int total = 1;
 for (i = 1; i <= n; i++)
  total *= i;
 return(total);
}
ls
hello		hello.c		hello.dSYM	help.c		main.c

First we can make the two objects:

gcc -Wall -g -c main.c

but here we get an error:

main.c:10:1: warning: return type of 'main' is not 'int' [-Wmain-return-type]
void main () {
^
main.c:10:1: note: change return type to 'int'
void main () {
^~~~
int
main.c:14:52: error: implicit declaration of function 'sum' is invalid in C99
      [-Werror,-Wimplicit-function-declaration]
 printf("The sum of the first n integers is %d\n", sum(n));
                                                   ^
main.c:15:56: error: implicit declaration of function 'product' is invalid in C99
      [-Werror,-Wimplicit-function-declaration]
 printf("The product of the first n integers is %d\n", product(n));
                                                       ^
1 warning and 2 errors generated.

We can get around this, by telling main about the functions by adding

int sum(int n);
int product (int n);

to the main.c

nano main.c
gcc -Wall -g -c main.c
main.c:13:1: warning: return type of 'main' is not 'int' [-Wmain-return-type]
void main () {
^
main.c:13:1: note: change return type to 'int'
void main () {
^~~~
int
1 warning generated.

and then the helper code

gcc -Wall -g -c help.c
ls
hello		hello.dSYM	help.o		main.o
hello.c		help.c		main.c

Tip

One reason we split code is to make it readable, but another reason is what we just did. We can compile each file separately, when your code is large and compiling takes a long time, splitting it will mean you only have to recompile the file(s) you have recently changed and relink, instead of recompiling everything.

and finally we link them.

gcc -o demo main.o help.o -lm
demo		hello.c		help.c		main.c
hello		hello.dSYM	help.o		main.o
./demo
Please enter a small positive integer: 4
The sum of the first n integers is 10
The product of the first n integers is 24
./demo
Please enter a small positive integer: 7
The sum of the first n integers is 28
The product of the first n integers is 5040
cat main.o
????????
        __text__TEXTg?t
?__cstring__TEXTg}__debug_str__DWARF???__debug_abbrev__DWARF?GB__debug_info__DWARF?^?__apple_names__DWARF?<?__apple_objc__DWARF{$#__apple_namespac__DWARF?$G__apple_types__DWARF?Gk__compact_unwind__LD ?__eh_frame__TEXT0@?
                                h__debug_line__DWARFp2

4	$
         PUH??H??H?=X??H?=rH?u??E????}??E??H?=U?ư??}??E??H?=a?ư?H??]?Please enter a small positive integer: %dThe sum of the first n integers is %d
The product of the first n integers is %d
Apple clang version 12.0.0 (clang-1200.0.32.2)main.c/Library/Developer/CommandLineTools/SDKs/MacOSX.sdkMacOSX.sdk/Users/brownsarahm/Documents/inclass/systems/compilecmainnint%?|?:
                                                                                     ;
                                                                                      ?4:
                                                                                         ;
                                                                                          I$>


                                                                                             Z
                                                                                            /6?|?V?HSAH
         j?|,?2HSAH
                   ????HSAH
                           ????HSAH
                                   0??
                                      4?V$gzRx
main.c                                       ?$????????gA?C

??K4fX@?8fX?]-TM-B-92-'--
                         3&+
_product_main_sum_printf_scanf```

15.10. What does the -o option do?#

we can remove it to see

gcc main.o help.o -lm
ls

in this case it makes up a name for the executable

a.out		hello		hello.dSYM	help.o		main.o
demo		hello.c		help.c		main.c
ls -ls
total 352
104 -rwxr-xr-x  1 brownsarahm  staff  50072 Oct 31 17:28 a.out
104 -rwxr-xr-x  1 brownsarahm  staff  50072 Oct 31 17:23 demo
104 -rwxr-xr-x  1 brownsarahm  staff  49688 Oct 31 17:05 hello
  8 -rw-r--r--  1 brownsarahm  staff     63 Oct 31 16:44 hello.c
  0 drwxr-xr-x  3 brownsarahm  staff     96 Oct 31 17:05 hello.dSYM
  8 -rw-r--r--  1 brownsarahm  staff    476 Oct 31 17:10 help.c
  8 -rw-r--r--  1 brownsarahm  staff   2364 Oct 31 17:18 help.o
  8 -rw-r--r--  1 brownsarahm  staff    381 Oct 31 17:17 main.c
  8 -rw-r--r--  1 brownsarahm  staff   2392 Oct 31 17:18 main.o
./a.out
Please enter a small positive integer: 9
The sum of the first n integers is 45
The product of the first n integers is 362880

so it still works without specifying a name for the executable, but it’s a lot neater to give it a meaningful name.

15.11. Is the exectuable file binary?#

Yes. We’ll discuss this more carefully next class. It is binary, but the terminal app splits the binary every 8 bits and converts it to a character. Since that’s now how the file was written, we get random looking characters.

cat demo
????X? H__PAGEZERO?__TEXT@@__text__TEXT>?>?__stubs__TEXT?>
                                                          ??__stub_helper__TEXT
                                                                               ?$
                                                                                 ??__cstring__TEXT0?}0?__unwind_info__TEXT??H???__DATA_CONST@@@@__got__DATA_CONST@?__DATA?@?@__la_symbol_ptr__DATA??__data__DATA?H__LINKEDIT?@??"?? ? @?H??h?0
                                                  PP?
                                                      /usr/lib/dyld?~U`<???)#?>?2

a*(?>
     8d/usr/lib/libSystem.B.dylib&?)??UH??H??H?=???H?=+H?u??E?????}??E??:H?=?ư??}??E??_H?=?ư?H??]Ð????????UH??}??E??E??E?;E???E?E?E?E????E???????E?]?UH??}??E??E??E?;E???E??E?E?E????E???????E?]??%?@?%?@L??@AS?%??h?????h?????Please enter a small positive integer: %dThe sum of the first n integers is %d
The product of the first n integers is %d
>44?>4
      ?&?#R@dyld_stub_binderQr?s@_printf?@_scanf?__mh_execute_header/main3sum8product=?|?}?}?|p@?Yd?d?f?;`c.>?$>$gNgdYd?d?f?;`c.?>$?>$@N@.?>#$?>$>N>d>?>%?>*29__mh_execute_header_main_product_sum_printf_scanfdyld_stub_binder__dyld_private/Users/brownsarahm/Documents/inclass/systems/compilec/main.c/Users/brownsarahm/Documents/inclass/systems/compilec/main.o_mainhelp.c/Users/brownsarahm/Documents/inclass/systems/compilec/help.o_sum_product```

15.12. Review today’s class#

  1. Update more rows on your KWL Chart based on what we did today.

  2. Practice using gcc. Repeat steps we did in class, change the order of parameters; try skipping steps to produce errors, etc. Then in gcctips.md summarize what you learned as a list of tips and reminders on what the parameters do/why/when you would need them (or not). (to reinforce what we learned)

  3. Contribute to your group repo and review a team mate’s PR>

15.13. Prepare for Next Class#

  1. Create operators.md and make some notes about what you know about operators. What kinds of operators are you familiar with? Which have you seen in programming? math?

15.14. More Practice#

  1. (priority) Write two short programs that do the same thing in different ways and compile them both to assembly (eg using a for vs while loop to sum numbers up to a number). Check the assembly to see if they produce the same thing or if it’s different. Save your code (in code blocks) and notes about your findings in assemblycompare.md

15.15. Questions After Class#

15.15.1. Is there a benefit to linking object files like we did today rather than just compiling a larger file with all of the functions declared and defined?#

Linking has to happen when you use libraries, we did this process to illustrate what happens. Also it’s much easier to read a larger file and can cut compilation time while you are working if you only have to recompile the small section you have edited recently.

15.15.2. is C the closest low level program that’s common now since no one uses assembly?#

15.15.3. What step of the compilation process is different from OS to OS?#

A lot of the differences are actually at the hardware level, even within an OS, but the object code and executales are system-or system type specific.

15.15.4. what can you build with assembly language?#

Anything!

15.15.5. Where can I read documentation on parameters and meanings?#

gcc man

15.15.6. Are the weird symbols from the machine code mean anything to us? or is the computer trying to make sense of something with alphabetical significance?#

This is machine code, it is binary that is designed for your hardware to interpret as instructions.

When we display it to the terminal, your computer tries to interpret that binary as text, so it ends up as weird characters mostly.

15.15.7. I thought the final step was to convert code into binary? Is that something that we can’t look at?#

The object code and the exetuable are both binary

15.15.8. Just confirming? To get an executable a compiler basically takes in the high level code to assembly which is then outputted to object code that is then linked doing “gcc -o -lm” to create it?#

Yes!!

15.15.9. One thing I’m curious about is if it’s possible to distinguish between compile errors and linking errors at the very least, more on if anything can be done about them when they pop up.#

Delete the main and help object files and remove the declarations from the main.c. Then try gcc -Wall -g  main.c. You’ll get a linker error, why?