24. Systems Programming and threading

24.1. Systems vs Application Programming

  • most of you will do a lot more application programming than systems programming

  • knowing how the systems stuff works is important because you will need to interact with it.

24.2. System interaction in Python

We’ll use the python interpreter a bit again today.

python
Python 3.8.6 (v3.8.6:db455296be, Sep 23 2020, 13:31:39)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

In order to interact with the system in a Python program, we use the python os library.

import os

It has methods that look a lot like a lot of the bash commands we have used.

For example listdir is like ls

os.listdir()
['test', 'tiny-book', 'checker.sh', 'compilec', 'README', '2022-04-07.md', 'nand2tetris', '.ipynb_checkpoints', 'courseutils', 'test2', '2022-04-07.ipynb', 'github-in-class-brownsarahm']

We could then use this as a python list object.

file_list = os.listdir()
file_list
['test', 'tiny-book', 'checker.sh', 'compilec', 'README', '2022-04-07.md', 'nand2tetris', '.ipynb_checkpoints', 'courseutils', 'test2', '2022-04-07.ipynb', 'github-in-class-brownsarahm']

This library can also help us interact with the system in a more abstract way, for example it can hide what varies per operating system.

For example the following works on Linux and MacOS, but would not work on Windows.

with open('tiny-book/intro.md','r') as f:
    f.read()

The path module within the library will join parts of a path together with the right delimiter. The following will output differently depending on what operating system you run it on

os.path.join('tiny-book','intro.md')

On Unix-based systems

'tiny-book/intro.md'

on Windows it would be:

'tiny-book\\intro.md'

Knowing that tools like this exist allows you to interface with the system and write different types of programs.

This is an important type of library to know exists so that you can look up the specific functions and modules you would need in any given scenario.

and we can exit Python now.

exit()

24.3. Threading

We are going to pretend that summing squares of numbers is expensive computation and we want to spread it across different multiple cores. To do that, we spread it into separate threads.

nano sq_sum_threaded.c

The program will be:

#include <stdlib.h>
#include <stdio.h>
#include <pthread.h>

/* single global variable */
/* shared, accessible, modifiable by all threads */
int accum = 0;

void* square(void* x) {
    int xi = (int)x;
    accum += xi * xi;
    return NULL; /* nothing to return, prevent warning */
}

int main(int argc, char** argv) {
    int i;
    pthread_t ths[20];
    for (i = 0; i < 20; i++) {
        pthread_create(&ths[i], NULL, square, (void*)(i + 1));
    }

    for (i = 0; i < 20; i++) {
        void* res;
        pthread_join(ths[i], &res);
    }

    printf("accum = %d\n", accum);
    return 0;
}

Then we can build the program.

gcc -pthread -Wall -g -o sqsum sq_sum_threaded.c -lm

We can both compile and link it at once and we get just a warning

sq_sum_threaded.c:19:43: warning: cast to 'void *' from smaller integer type
      'int' [-Wint-to-void-pointer-cast]
    pthread_create(&ths[i], NULL, square, (void*)(i + 1));
                                          ^
1 warning generated.

and we can run the program

./sqsum

I got the wrong answer

accum = 2833

We saw that different students got different answers.

We can use a for loop in bash to explore this further and figure out why.

for i in {1..1000}; do ./sqsum; done | sort | uniq -c

this also uses some new bash commands:

  • sort orders all of the outputs of the 1000 runs of our program

  • and uniq with the -c option counts how many times any given result appeart multiple times

   1 accum = 2701
   1 accum = 2726
   1 accum = 2749
   1 accum = 2770
   1 accum = 2814
   1 accum = 2820
   1 accum = 2821
   1 accum = 2833
   4 accum = 2834
   1 accum = 2841
   8 accum = 2845
   8 accum = 2854
  10 accum = 2861
   4 accum = 2866
   2 accum = 2869
 955 accum = 2870

So, this time I got the right answer most of the times 955 out of 1000, but lots of other answers at least once.

To understand what happens, lets look at the following program, which should be an equivalent way to implement the body of the square function.

int temp = accum;
temp += x * x;
accum = temp;

In this one, we first copy the accum value to a temporary variable, then square the value and add that to temp, and then finally add that value back to accum. This should be equivalent to the program above, result wise.

Even though this is not how we wrote our program, this is actually what it has to do, as we spin out each process.

This table traces through what occurs in two threads.

// Thread 1

// Thread 2

Status

int temp1 = accum;

int temp2 = accum;

now temp1 = temp2 = 0

`temp2 += 2 * 2;

now temp2 = 4

temp1 += 1 * 1;

// temp1 = 1

accum = temp1;

// accum = 1

accum = temp2;

// accum = 4

So, what happens is each thread looks at the current value of accum and stores it to a thread-specific temporary variable. Each thread has its own memory, but they do all share the global variables.

Then thread 2 completes its calculation and updates temp2 and thread 1 updates temp1. So far everything is okay, but next thread 1 writes to accum and sets it to 1, and finally thread to writes to accum and makes it 4. The two values from the threads did not get added together, because thread 2 started before thread 1 finished.

So, we end up losing some of the values.

24.4. Locking

We can instead change our square function as follows:

int accum = 0;
pthread_mutex_t accum_mutex = PTHREAD_MUTEX_INITIALIZER;

void* square(void* x) {
    int xi = (int)x;
    int temp = xi * xi;

    pthread_mutex_lock(&accum_mutex);
    accum += temp;
    pthread_mutex_unlock(&accum_mutex);

    return NULL; /* nothing to return, prevent warning */
}

and save this to a new file sq_sum_threaded_m.c

This version uses something from the pthread library, to create a lock.

Now when it executes each thread will do the calculation part on it’s own time, possibly simultaneously. Then the lock part means that they will each take turns to add their value to the global variable accum.

We can the build it

gcc -pthread -Wall -g -o sqsum_m sq_sum_threaded_m.c -lm
sq_sum_threaded_m.c:26:43: warning: cast to 'void *' from smaller integer type 'int'
      [-Wint-to-void-pointer-cast]
    pthread_create(&ths[i], NULL, square, (void*)(i + 1));
                                          ^
1 warning generated.

we get the same warning

Now we can run it again, using our loop

for i in {1..1000}; do ./sqsum_m; done | sort | uniq -c

and now we get the same result all 1000 times.

1000 accum = 2870

24.5. Prepare for next class

  1. (Priority) Work with a partner to add "advice to future students" testimonials to website for this offering and "why take this class" testimonials to the main course site using an integration manager like workflow. Coordinate to determine who's fork will be used for integration of each . Each of you should clone that fork and add your contribution to resources/testimonials.md and fromstudents.md on a branch, <initials>_testimonial eg Professor Brown's would be smb_testimnial. Then merge the two contributions to a single testimonial branch. (both to help future students and to practice with branches and collaboration, we will work with these further in class next week)

  2. Review the notes.

  3. Update your KWL chart. Check that all rows are included.

  4. Simulate a more computationally intensive program using the sleep function in C and compare the time of a threaded vs single threaded version of the program. Include the two programs, output of your test, explanation text, and conclusions in singlevmultithread.md (to better illustrate the impact of the threads)

24.6. More Practice

  1. Learn about the system libraries in two languages (one can be C or Python, one must be something else). Find the name(s) of the library or libraries. In systeminteraction.md make notes for each language about what categories of functions they provide.

  2. (priority) Research examples of programs using multi-threading besides splitting up a single calculation for time reasons, include three examples in whymultithread.md.

24.7. Questions After Class

24.7.1. Are there certain languages or systems that lock for you?

Different languages handle threading different ways. I do not know all of them. I would expect that there are some libraries and frameworks that help handle it for you.

The important reason to use a lower level language like C to explore this, where things are not handled for you, to learn the concepts and see all of the steps that have to occur. This helps you when it is supposed to be handled, but something goes wrong.

24.7.2. Could you recommend any extra readings about this?

You will learn more about threading in CSC412 and you can optionally take CSC415 parallel programming.

Threading and concurrency is a generally hard to grasp topic, so there are lots of tutorials and examples online in lots of different languages.

High Performance Computing Carpentry has a workshop. It begins with why you would use parallel computing and then shows how to do some operations on a cluster. You could work through this on the cluster we used in class.

24.7.3. Can this cause a deadlock?

Yes, deadlock is a possible outcome of having multiple threads. The particular program we wrote today, would not, but using threads for more co-dependent components of a program could.

24.7.4. What kinda of signs would be visible that we need to make our program multithreaded?

Researching more examples and then brainstorming a few based on the ones you find is actually a part of the more practice for today’s class.

The general case is just when there are two or more things that need to happen “at the same time” not in sequence, even if they can sort of take turns.

24.7.5. One question I have is why would you want to split up your problem among different computers? Is it just so that more parts of your problem can get done at the same time?

High Performance Computing Carpentry has a good description of why you would want to split a program across multiple computers.