13. How can I work on a remote server?

Today we will connect to a remote server, use an HPC system with scheduler, and learn new bash commands for working with the content of files.

13.1. What are remote servers and HPC systems?

diagram illustrating a remot connection to a login node and compute cluster

13.2. Connecting to Seawulf

We connect with secure shell or ssh from our terminal (GitBash or Putty on windows) to URI’s teaching High Performance Computing (HPC) Cluster Seawulf.

Our login is the part of your uri e-mail address before the @ and I will tell you how to find your default password if you missed class.

ssh -l brownsarahm seawulf.uri.edu

When it logs in it looks like this and requires you to change your password. They configure it with a default and with it past expired.

The authenticity of host 'seawulf.uri.edu (131.128.217.210)' can't be established.
ECDSA key fingerprint is SHA256:RwhTUyjWLqwohXiRw+tYlTiJEbqX2n/drCpkIwQVCro.
Are you sure you want to continue connecting (yes/no/[fingerprint])? y
Please type 'yes', 'no' or the fingerprint: yes
Warning: Permanently added 'seawulf.uri.edu,131.128.217.210' (ECDSA) to the list of known hosts.
brownsarahm@seawulf.uri.edu's password:
You are required to change your password immediately (root enforced)
WARNING: Your password has expired.
You must change your password now and login again!
Changing password for user brownsarahm.
Changing password for brownsarahm.
(current) UNIX password:
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
Connection to seawulf.uri.edu closed.

after you give it a new password, then it logs you out and you have to log back in.

brownsarahm@~ $ ssh -l brownsarahm seawulf.uri.edu
brownsarahm@seawulf.uri.edu's password:
Last login: Tue Mar  8 12:52:38 2022 from 172.20.133.152

We have logged into our home directory which is empty

[brownsarahm@seawulf ~]$ ls
[brownsarahm@seawulf ~]$ pwd
/home/brownsarahm
[brownsarahm@seawulf ~]$ whoami
brownsarahm

Notice that the prompt says uriusername@seawulf to indicate that you are logged into the server, not working locally.

13.3. Downloading files

[brownsarahm@seawulf ~]$ wget http://www.hpc-carpentry.org/hpc-shell/files/bash-lesson.tar.gz
--2022-03-08 12:58:09--  http://www.hpc-carpentry.org/hpc-shell/files/bash-lesson.tar.gz
Resolving www.hpc-carpentry.org (www.hpc-carpentry.org)... 104.21.33.152, 172.67.146.136
Connecting to www.hpc-carpentry.org (www.hpc-carpentry.org)|104.21.33.152|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 12534006 (12M) [application/gzip]
Saving to: ‘bash-lesson.tar.gz’

100%[======================================>] 12,534,006  4.19MB/s   in 2.9s   

2022-03-08 12:58:12 (4.19 MB/s) - ‘bash-lesson.tar.gz’ saved [12534006/12534006]

[brownsarahm@seawulf ~]$ ls
bash-lesson.tar.gz
[brownsarahm@seawulf ~]$ tar -xvf bash-lesson.tar.gz
dmel-all-r6.19.gtf
dmel_unique_protein_isoforms_fb_2016_01.tsv
gene_association.fb
SRR307023_1.fastq
SRR307023_2.fastq
SRR307024_1.fastq
SRR307024_2.fastq
SRR307025_1.fastq
SRR307025_2.fastq
SRR307026_1.fastq
SRR307026_2.fastq
SRR307027_1.fastq
SRR307027_2.fastq
SRR307028_1.fastq
SRR307028_2.fastq
SRR307029_1.fastq
SRR307029_2.fastq
SRR307030_1.fastq
SRR307030_2.fastq
[brownsarahm@seawulf ~]$ ls
bash-lesson.tar.gz                           SRR307026_1.fastq
dmel-all-r6.19.gtf                           SRR307026_2.fastq
dmel_unique_protein_isoforms_fb_2016_01.tsv  SRR307027_1.fastq
gene_association.fb                          SRR307027_2.fastq
SRR307023_1.fastq                            SRR307028_1.fastq
SRR307023_2.fastq                            SRR307028_2.fastq
SRR307024_1.fastq                            SRR307029_1.fastq
SRR307024_2.fastq                            SRR307029_2.fastq
SRR307025_1.fastq                            SRR307030_1.fastq
SRR307025_2.fastq                            SRR307030_2.fastq

13.4. Working with large files

One of these files, contains the entire genome for the common fruitfly, let’s take a look at it: \

[brownsarahm@seawulf ~]$ cat dmel-all-r6.19.gtf
X	FlyBase	gene	19961297	19969323	.	+	.	gene_id "FBgn0031081"; gene_symbol "Nep3";
2L	FlyBase	3UTR	782825	782885	.	+	.	gene_id "FBgn0041250"; gene_symbol "Gr21a"; transcript_id "FBtr0331651"; transcript_symbol "Gr21a-RB";

Warning

this output is truncated for display purposes

We see that this actually take a long time to output and is way tooo much information to actually read. In fact, in order to make the website work, I had to cut that content using command line tools, my text editor couldn’t open the file and GitHub was unhappy when I pushed it.

For a file like this, we don’t really want to read the whole file but we do need to know what it’s strucutred like in order to design programs to work with it.

head lets us look at the first 10 lines.

[brownsarahm@seawulf ~]$ head dmel-all-r6.19.gtf
X	FlyBase	gene	19961297	19969323	.	+	.	gene_id "FBgn0031081"; gene_symbol "Nep3";
X	FlyBase	mRNA	19961689	19968479	.	+	.	gene_id "FBgn0031081"; gene_symbol "Nep3"; transcript_id "FBtr0070000"; transcript_symbol "Nep3-RA";
X	FlyBase	5UTR	19961689	19961845	.	+	.	gene_id "FBgn0031081"; gene_symbol "Nep3"; transcript_id "FBtr0070000"; transcript_symbol "Nep3-RA";
X	FlyBase	exon	19961689	19961845	.	+	.	gene_id "FBgn0031081"; gene_symbol "Nep3"; transcript_id "FBtr0070000"; transcript_symbol "Nep3-RA";
X	FlyBase	exon	19963955	19964071	.	+	.	gene_id "FBgn0031081"; gene_symbol "Nep3"; transcript_id "FBtr0070000"; transcript_symbol "Nep3-RA";
X	FlyBase	exon	19964782	19964944	.	+	.	gene_id "FBgn0031081"; gene_symbol "Nep3"; transcript_id "FBtr0070000"; transcript_symbol "Nep3-RA";
X	FlyBase	exon	19965006	19965126	.	+	.	gene_id "FBgn0031081"; gene_symbol "Nep3"; transcript_id "FBtr0070000"; transcript_symbol "Nep3-RA";
X	FlyBase	exon	19965197	19965511	.	+	.	gene_id "FBgn0031081"; gene_symbol "Nep3"; transcript_id "FBtr0070000"; transcript_symbol "Nep3-RA";
X	FlyBase	exon	19965577	19966071	.	+	.	gene_id "FBgn0031081"; gene_symbol "Nep3"; transcript_id "FBtr0070000"; transcript_symbol "Nep3-RA";
X	FlyBase	exon	19966183	19967012	.	+	.	gene_id "FBgn0031081"; gene_symbol "Nep3"; transcript_id "FBtr0070000"; transcript_symbol "Nep3-RA";

We can use the -n parameter to change the number.

And, tails shows the last few.

[brownsarahm@seawulf ~]$ tail dmel-all-r6.19.gtf

which in this case looks mostly the same

2L	FlyBase	exon	782124	782181	.	+	.	gene_id "FBgn0041250"; gene_symbol "Gr21a"; transcript_id "FBtr0331651"; transcript_symbol "Gr21a-RB";
2L	FlyBase	exon	782238	782441	.	+	.	gene_id "FBgn0041250"; gene_symbol "Gr21a"; transcript_id "FBtr0331651"; transcript_symbol "Gr21a-RB";
2L	FlyBase	exon	782495	782885	.	+	.	gene_id "FBgn0041250"; gene_symbol "Gr21a"; transcript_id "FBtr0331651"; transcript_symbol "Gr21a-RB";
2L	FlyBase	start_codon	781297	781299	.	+	0	gene_id "FBgn0041250"; gene_symbol "Gr21a"; transcript_id "FBtr0331651"; transcript_symbol "Gr21a-RB";
2L	FlyBase	CDS	781297	782048	.	+	0	gene_id "FBgn0041250"; gene_symbol "Gr21a"; transcript_id "FBtr0331651"; transcript_symbol "Gr21a-RB";
2L	FlyBase	CDS	782124	782181	.	+	1	gene_id "FBgn0041250"; gene_symbol "Gr21a"; transcript_id "FBtr0331651"; transcript_symbol "Gr21a-RB";
2L	FlyBase	CDS	782238	782441	.	+	0	gene_id "FBgn0041250"; gene_symbol "Gr21a"; transcript_id "FBtr0331651"; transcript_symbol "Gr21a-RB";
2L	FlyBase	CDS	782495	782821	.	+	0	gene_id "FBgn0041250"; gene_symbol "Gr21a"; transcript_id "FBtr0331651"; transcript_symbol "Gr21a-RB";
2L	FlyBase	stop_codon	782822	782824	.	+	0	gene_id "FBgn0041250"; gene_symbol "Gr21a"; transcript_id "FBtr0331651"; transcript_symbol "Gr21a-RB";
2L	FlyBase	3UTR	782825	782885	.	+	.	gene_id "FBgn0041250"; gene_symbol "Gr21a"; transcript_id "FBtr0331651"; transcript_symbol "Gr21a-RB";

We can also see how much content is in the file wc give a word count and with its -l parameter gives us the number of lines.

[brownsarahm@seawulf ~]$ wc -l dmel-all-r6.19.gtf
542048 dmel-all-r6.19.gtf

Over five hundred forty thousand lines is a lot.

How can we get the number of lines in each of the .fastq files?

[brownsarahm@seawulf ~]$ wc -l *.fastw
wc: *.fastw: No such file or directory

note that in my typo, it tells me no files matched my pattern.

[brownsarahm@seawulf ~]$ wc -l *.fastq
   20000 SRR307023_1.fastq
   20000 SRR307023_2.fastq
   20000 SRR307024_1.fastq
   20000 SRR307024_2.fastq
   20000 SRR307025_1.fastq
   20000 SRR307025_2.fastq
   20000 SRR307026_1.fastq
   20000 SRR307026_2.fastq
   20000 SRR307027_1.fastq
   20000 SRR307027_2.fastq
   20000 SRR307028_1.fastq
   20000 SRR307028_2.fastq
   20000 SRR307029_1.fastq
   20000 SRR307029_2.fastq
   20000 SRR307030_1.fastq
   20000 SRR307030_2.fastq
  320000 total

when it does work, we also get the total.

We can use redirects as before to save these to a file:

[brownsarahm@seawulf ~]$ wc -l *.fastq > linecounts.txt
[brownsarahm@seawulf ~]$ cat linecounts.txt
   20000 SRR307023_1.fastq
   20000 SRR307023_2.fastq
   20000 SRR307024_1.fastq
   20000 SRR307024_2.fastq
   20000 SRR307025_1.fastq
   20000 SRR307025_2.fastq
   20000 SRR307026_1.fastq
   20000 SRR307026_2.fastq
   20000 SRR307027_1.fastq
   20000 SRR307027_2.fastq
   20000 SRR307028_1.fastq
   20000 SRR307028_2.fastq
   20000 SRR307029_1.fastq
   20000 SRR307029_2.fastq
   20000 SRR307030_1.fastq
   20000 SRR307030_2.fastq
  320000 total

We can also search files, without loading them all into memory or displaying them, with grep:

[brownsarahm@seawulf ~]$ grep Act5c dmel-all-r6.19.gtf
[brownsarahm@seawulf ~]$ grep mRNA dmel-all-r6.19.gtf

this output a lot, so the output is truncated here

X	FlyBase	mRNA	19961689	19968479	.	+	.	gene_id "FBgn0031081"; gene_symbol "Nep3"; transcript_id "FBtr0070000"; transcript_symbol "Nep3-RA";
2L	FlyBase	mRNA	781276	782885	.	+	.	gene_id "FBgn0041250"; gene_symbol "Gr21a"; transcript_id "FBtr0331651"; transcript_symbol "Gr21a-RB";

and we can combine grep with wc to count occurences.

[brownsarahm@seawulf ~]$ grep mRNA dmel-all-r6.19.gtf | wc -l
34025

13.5. File permissions

Let’s make a small script, recalling what we have learned so far:

[brownsarahm@seawulf ~]$ echo "echo 'script works'" >> demo.sh

We can confirm that the script looks like a we expected

[brownsarahm@seawulf ~]$ cat demo.sh
echo 'script works'

One thing we could do is to run the script using ./

[brownsarahm@seawulf ~]$ ./demo.sh

but we get a permission denied error

-bash: ./demo.sh: Permission denied

By default, files have different types of permissions: read, write, and execute for different users that can access them. To view the permissions, we can use the -l option of ls.

[brownsarahm@seawulf ~]$ ls -l
total 138452
-rw-r--r--. 1 brownsarahm spring2022-csc392 12534006 Apr 18  2021 bash-lesson.tar.gz
-rw-r--r--. 1 brownsarahm spring2022-csc392       20 Mar  8 13:12 demo.sh
-rw-r--r--. 1 brownsarahm spring2022-csc392 77426528 Jan 16  2018 dmel-all-r6.19.gtf
-rw-r--r--. 1 brownsarahm spring2022-csc392   721242 Jan 25  2016 dmel_unique_protein_isoforms_fb_2016_01.tsv
-rw-r--r--. 1 brownsarahm spring2022-csc392 25056938 Jan 25  2016 gene_association.fb
-rw-r--r--. 1 brownsarahm spring2022-csc392      447 Mar  8 13:07 linecounts.txt
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625262 Jan 25  2016 SRR307023_1.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625262 Jan 25  2016 SRR307023_2.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625376 Jan 25  2016 SRR307024_1.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625376 Jan 25  2016 SRR307024_2.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625286 Jan 25  2016 SRR307025_1.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625286 Jan 25  2016 SRR307025_2.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625302 Jan 25  2016 SRR307026_1.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625302 Jan 25  2016 SRR307026_2.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625312 Jan 25  2016 SRR307027_1.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625312 Jan 25  2016 SRR307027_2.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625338 Jan 25  2016 SRR307028_1.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625338 Jan 25  2016 SRR307028_2.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625390 Jan 25  2016 SRR307029_1.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625390 Jan 25  2016 SRR307029_2.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625318 Jan 25  2016 SRR307030_1.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625318 Jan 25  2016 SRR307030_2.fastq

For each file we get 10 characters in the first column that describe the permissions. The 3rd column is the username of the owner, the fourth is the group, then size date revised and the file name.

We are most interested in the 10 character permissions. The fist column indicates if any are directories with a d or a - for files. We have no directories, but we can create one to see this.

[brownsarahm@seawulf ~]$ mkdir results
[brownsarahm@seawulf ~]$ ls -l
total 138452
-rw-r--r--. 1 brownsarahm spring2022-csc392 12534006 Apr 18  2021 bash-lesson.tar.gz
-rw-r--r--. 1 brownsarahm spring2022-csc392       20 Mar  8 13:12 demo.sh
-rw-r--r--. 1 brownsarahm spring2022-csc392 77426528 Jan 16  2018 dmel-all-r6.19.gtf
-rw-r--r--. 1 brownsarahm spring2022-csc392   721242 Jan 25  2016 dmel_unique_protein_isoforms_fb_2016_01.tsv
-rw-r--r--. 1 brownsarahm spring2022-csc392 25056938 Jan 25  2016 gene_association.fb
-rw-r--r--. 1 brownsarahm spring2022-csc392      447 Mar  8 13:07 linecounts.txt
**drwxr-xr-x. 2 brownsarahm spring2022-csc392       10 Mar  8 13:16 results**
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625262 Jan 25  2016 SRR307023_1.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625262 Jan 25  2016 SRR307023_2.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625376 Jan 25  2016 SRR307024_1.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625376 Jan 25  2016 SRR307024_2.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625286 Jan 25  2016 SRR307025_1.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625286 Jan 25  2016 SRR307025_2.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625302 Jan 25  2016 SRR307026_1.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625302 Jan 25  2016 SRR307026_2.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625312 Jan 25  2016 SRR307027_1.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625312 Jan 25  2016 SRR307027_2.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625338 Jan 25  2016 SRR307028_1.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625338 Jan 25  2016 SRR307028_2.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625390 Jan 25  2016 SRR307029_1.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625390 Jan 25  2016 SRR307029_2.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625318 Jan 25  2016 SRR307030_1.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625318 Jan 25  2016 SRR307030_2.fastq

We can see in the bold line, that the first character is a d.

The next nine characters indicate permission to Read, Write, and eXecute a file. With either the letter or a - for permissions not granted, they appear in three groups of three, three characters each for owner, group, anyone with access.

If we want to run the file, we can instead use bash directly, but this is limited relative to calling our script in other ways.

[brownsarahm@seawulf ~]$ bash demo.sh
script works

Instead, to add execute permission, we can use chmod

[brownsarahm@seawulf ~]$ chmod +x demo.sh
[brownsarahm@seawulf ~]$ ls -l
total 138452
-rw-r--r--. 1 brownsarahm spring2022-csc392 12534006 Apr 18  2021 bash-lesson.tar.gz
-rwxr-xr-x. 1 brownsarahm spring2022-csc392       20 Mar  8 13:12 demo.sh
-rw-r--r--. 1 brownsarahm spring2022-csc392 77426528 Jan 16  2018 dmel-all-r6.19.gtf
-rw-r--r--. 1 brownsarahm spring2022-csc392   721242 Jan 25  2016 dmel_unique_protein_isoforms_fb_2016_01.tsv
-rw-r--r--. 1 brownsarahm spring2022-csc392 25056938 Jan 25  2016 gene_association.fb
-rw-r--r--. 1 brownsarahm spring2022-csc392      447 Mar  8 13:07 linecounts.txt
drwxr-xr-x. 2 brownsarahm spring2022-csc392       10 Mar  8 13:16 results
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625262 Jan 25  2016 SRR307023_1.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625262 Jan 25  2016 SRR307023_2.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625376 Jan 25  2016 SRR307024_1.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625376 Jan 25  2016 SRR307024_2.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625286 Jan 25  2016 SRR307025_1.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625286 Jan 25  2016 SRR307025_2.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625302 Jan 25  2016 SRR307026_1.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625302 Jan 25  2016 SRR307026_2.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625312 Jan 25  2016 SRR307027_1.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625312 Jan 25  2016 SRR307027_2.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625338 Jan 25  2016 SRR307028_1.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625338 Jan 25  2016 SRR307028_2.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625390 Jan 25  2016 SRR307029_1.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625390 Jan 25  2016 SRR307029_2.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625318 Jan 25  2016 SRR307030_1.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625318 Jan 25  2016 SRR307030_2.fastq
[brownsarahm@seawulf ~]$ ./demo.sh
script works
[brownsarahm@seawulf ~]$ nano demo.sh

We can add a bit more to our script to make it more interesting

for VAR in *.gz
do
  echo $VAR
done


echo 'script works'

and note that tht does not change the permission.

[brownsarahm@seawulf ~]$ ls -l
total 138452
-rw-r--r--. 1 brownsarahm spring2022-csc392 12534006 Apr 18  2021 bash-lesson.tar.gz
-rwxr-xr-x. 1 brownsarahm spring2022-csc392       60 Mar  8 13:27 demo.sh
-rw-r--r--. 1 brownsarahm spring2022-csc392 77426528 Jan 16  2018 dmel-all-r6.19.gtf
-rw-r--r--. 1 brownsarahm spring2022-csc392   721242 Jan 25  2016 dmel_unique_protein_isoforms_fb_2016_01.tsv
-rw-r--r--. 1 brownsarahm spring2022-csc392 25056938 Jan 25  2016 gene_association.fb
-rw-r--r--. 1 brownsarahm spring2022-csc392      447 Mar  8 13:07 linecounts.txt
drwxr-xr-x. 2 brownsarahm spring2022-csc392       10 Mar  8 13:16 results
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625262 Jan 25  2016 SRR307023_1.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625262 Jan 25  2016 SRR307023_2.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625376 Jan 25  2016 SRR307024_1.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625376 Jan 25  2016 SRR307024_2.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625286 Jan 25  2016 SRR307025_1.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625286 Jan 25  2016 SRR307025_2.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625302 Jan 25  2016 SRR307026_1.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625302 Jan 25  2016 SRR307026_2.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625312 Jan 25  2016 SRR307027_1.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625312 Jan 25  2016 SRR307027_2.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625338 Jan 25  2016 SRR307028_1.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625338 Jan 25  2016 SRR307028_2.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625390 Jan 25  2016 SRR307029_1.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625390 Jan 25  2016 SRR307029_2.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625318 Jan 25  2016 SRR307030_1.fastq
-rw-r--r--. 1 brownsarahm spring2022-csc392  1625318 Jan 25  2016 SRR307030_2.fastq

13.6. Making a batch file

[brownsarahm@seawulf ~]$ nano my_job.sh
[brownsarahm@seawulf ~]$ sbatch my_job.sh
Submitted batch job 23950
[brownsarahm@seawulf ~]$ squeue --job 23950
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)

13.7. Interactive mode

So far we have worked on the login node which has very little computational power and submitted a job to a compute node through the scheduler for it to run asynchronously. We can also use a compute node interactively if we have a computationally intensive task that we need to work with in real time, instead of it being fully automated.

We’ll talk after the break about how this might look, but for example if our grep commands needed more power where we need the resutl back right away in order to then decide what the next calculation could be.

We can request a node to be allocated to our user and have a direct ssh connection to compute node using interactive. Interactive by default calls a specific script that is configured by the system administrators for the particular cluster that you are logged into.

For seawulf, interactive opens a default of 8 hgours, but you can use parameters to request more or less time.

[brownsarahm@seawulf ~]$ interactive

When we run this interactive submits a “job” to the schedule, but instead of then passing a script to the compute node it connects us to that node via ssh

salloc: Granted job allocation 23963
salloc: Waiting for resource configuration
salloc: Nodes n005 are ready for job
/usr/bin/id: cannot find name for user ID 1168
/usr/bin/id: cannot find name for group ID 1111
/usr/bin/id: cannot find name for user ID 1168

We can see that we are connected to a system with more resources using this:

[I have no name!@n005 ~]$ lshw
WARNING: you should run this program as super-user.
          configuration: driver=system
WARNING: output may be incomplete or inaccurate, you should run this program as super-user.

I truncataed the long output.

We can then exit the compute node and go back to the login node with exit

[I have no name!@n005 ~]$ exit
logout
salloc: Relinquishing job allocation 23963

Note tht it says the job has ended.

13.8. Logging off

You can log off with exit it is important to do so to not use resources you do not need. However, note tht if your laptop loses internet connection your session will also be disconnected. This is another advantage of the scheduler, you don’t need to maintain a connection for it to run your code.

[brownsarahm@seawulf ~]$ exit
logout
Connection to seawulf.uri.edu closed.

Note that when you are back to your computer the prompt changes. And that using the up arrow, the last command your computer’s shell knows is the ssh line.

brownsarahm@~ $ ssh -l brownsarahm seawulf.uri.edu

13.9. Prepare for next class

  1. Review the notes (to reinforce ideas and improve your memory of them)

  2. priority add a ## summary section to your IDE notes where you summarize what you observed in your IDE usage and what questions or what features (we will work with this after break and I want you to prepare before break)

  3. Modify the sbatch job we made in class so that you get files back for any errors and output from the script. see the options (to review/practice what we did)

13.10. More Practice

  1. priority write a script that outputs the first 3 lines of each .fastq line and sbatch script that runs it and saves the output to files and e-mails you when it is done. Submit the job. (to reinforce /practice what we saw in class)

  2. File permissions are represented numerically often in octal, by transforming the permissions for each level (owner, group, all) as binary into a number. Add octal.md to your KWL repo and answer the following. Try to think through it on your own, but you may look up the answers, as long as you link to (or ideally cite using jupyterbook citations) a high quality source.

    1. Transform the permissions [`r--`, `rw-`, `rwx`] to octal, by treating it as three bits.
    1. Transform the permission we changed our script to `rwxr-xr-x` to octal.
    1. Which of the following would prevent a file from being edited by other people 777 or 755?
    
  3. Answer the following in hpc.md of your KWL repo: (to think about how the design of the system we used in class impacts programming and connect it to other ideas taught in CS)

    1. What kinds of things would your code need to do if you were going to run it on an HPC system?
    1. What Sbatch options seem the most helpful?
    1. How might you go about setting the time limits for a script? How could you estimate how long it will take?
    

13.11. Questions After Class

13.11.1. When is sbatch used?

Sbatch is one type of scheduler, that manages the resource of a high performance computing system

13.11.2. will sbatch kick out your job after a certain amount of time? what if your job was indefinite like hosting a website.

High performance computing systems, like the cluster that we logged into in class are ued for computationally expensive programs that might require more RAM and/or more CPUs in order to run and/or to run for a longer time than your laptop makes sense to leave it in one place.

For example, this is often used in research data analysis, like the genomics data we used today or for training machine learning models.

HPCs are a specifc type of remote resource, for indefinite jobs, we use different types of servers. However, ssh can still be used to log into those sytems.