Beginner Guide to HPC3

This is a step by step beginner guide that explains how to get an account, to login, and to do a few simple things on HPC3. Examples in this guide use the following color schema:

very important message || links to documents and sites || command name || what you need to type

1. Acceptable Use Policy

HPC3 is a shared facility. Please be aware that there may be 100+ active users logged in at any particular time. What you do can have dramatic effects on others. Please read and abide by the Acceptable Use Policy for resources managed by the Research Cyberinfrastructure Center at UCI. Violations of this policy or any other applicable University policies may result in the temporary or permanent removal of accounts associated with research computing.

2. Your Laptop

You will need to have a few applications on your laptop, most are standard:

software to access UCI VPN

Install this software according to the instructions provided in UCI campus VPN.

Terminal

This application depends on what laptop/desktop you have

Linux: use your favorite terminal application

Mac: Terminal or iTerm2

Windows: PuTTY or MobaXterm

Windows 10: Windows Terminal, Linux Subsystem for Windows or MobaXterm

ssh and scp

Both applications are standard on Mac, Linux, and Windows 10. Allow to securely connect to remote servers and transfer the data.

rsync

Optionally can be used in place of scp. Standard on Mac, Linux. On windows 10 is available with Linux Subsystem for Windows. Used for transferring files to/from remote servers.

Filezilla or WinSCP

On windows laptops, graphical programs for file transfer.

MobaXterm users DO NOT enable Remote monitoring! This is an experimental feature of MobaXterm that runs unnecessary multiple processes on login node under your account. These processes add to the overall load on the cluster. None of the information they collect you can use for your work on the cluster.

3. Get an account

Please send email to hpc-support@uci.edu and provide your name and your UCINetID. The email opens a ticket in our ticketing system. Once your account is created you will be notified by email.

4. Log in

Step 1 Connect to UCI campus VPN, see instructions UCI campus VPN

Step 2 Open your Terminal application

Step 3 In your Terminal application start ssh session. You will need to use your regular UCI credentials (UCINetID and password) to connect to an HPC3 login node hpc3.rcic.uci.edu

For example, a user with UCINetID panteater will use the following command and when prompted for a password will enter UCNetID password followed by Return key:

ssh panteater@hpc3.rcic.uci.edu
panteater@hpc3.rcic.uci.edu's password:

After a successful login you will see a screen similar to the following:

Last login: Thu Jul 15 15:25:59 2021 from 10.240.58.4
+-----------------------------------------+
|  _             _             _ _ ____   |
| | | ___   __ _(_)_ __       (_) | ___|  |
| | |/ _ \ / _` | | '_ \ _____| | |___ \  |
| | | (_) | (_| | | | | |_____| | |___) | |
| |_|\___/ \__, |_|_| |_|     |_|_|____/  |
|          |___/                          |
+-----------------------------------------+
 Distro:  CentOS 7.8 Core
 Virtual: NO

 CPUs:    40
 RAM:     191.9GB
 BUILT:   2020-03-02 13:32

 ACCEPTABLE USE: https://rcic.uci.edu/documents/RCIC-Acceptable-Use-Policy.pdf
login-i15 2001%

The above screen shows informational text output in the terminal window including information about the login node (HPC3 has a few identical login nodes), when the user was logged in last time and a link to acceptable cluster usage policy.

The last line of the output login-i15 2001% is your shell prompt, this is where you can type commands.

5. Simple commands

Those who are unfamiliar with Linux environment will need to learn the basics of Bash shell, file editing, or using language such as R or Python. Please see the Tutorials page that lists links to the online beginner tutorials.

The cluster shell is bash, which is a command language interpreter that executes commands read from the standard input (what you type). Prompt is provided by the bash shell automatically, you don’t need to type it.

Below is a small set of simple but very useful commands to try. What you type is in this color. Each command returns an output that will be displayed in your terminal and will be similar to the following:

[user@login-x:~]$ pwd       (1)
/data/homezvol0/panteater

[user@login-x:~]$ date      (2)
Mon Jul 19 12:43:42 PDT 2021

[user@login-x:~]$ hostname  (3)
login-i15

[user@login-x:~]$ ls        (4)
perl5

[user@login-x:~]$ ls -l     (5)
drwx------   3 panteater panteater    9 Jul 13 00:02 .
drwxr-xr-x 785 root      root       785 Jul 16 10:32 ..
-rw-r--r--   1 panteater panteater  183 Jul 12 14:42 .bash_profile
-rw-r--r--   1 panteater panteater  541 Jul 12 14:42 .bashrc
-rw-r--r--   1 panteater panteater  500 Jul 12 14:42 .emacs
-rw-r--r--   1 panteater panteater   17 Jul 12 14:42 .forward
-rw-------   1 panteater root      1273 Jul 13 00:02 .hpc-selective-backup
-rw-------   1 panteater root         0 Jul 13 00:02 .hpc-selective-backup-exclude
drwxrwxr-x   2 panteater panteater    2 Jun 15 09:48 perl5
1 pwd command prints name of current/working directory
2 date command prints current date and time in default format
3 hostname command prints current host name. The cluster has a few login nodes and multiple working nodes, each has its own unique name.
4 ls command prints directory contents.
5 ls with an optional flag -l command lists all contents including files that start with dot, or hidden files.

By default, many commands need no arguments or additional flags, just like most of the examples above. Arguments given to the commands provide more specific information in the output, as the last command above did.

To learn about specific commands consult tutorials or manual pages via man command. For example to learn more about ls command type (use the space key to scroll through the output on the screen):

[user@login-x:~]$ man ls

6. File editing

The cluster environment is not well suited for GUI type of applications, most of the commands users need to type in, there are no 'clickable' icons and pop-up windows.

Users will need to learn one of file editors vim or emacs. Please find online tutorials that explain how to use these programs. Choose the editor that is more intuitive for you. See the following links for beginners guides, many more are available online:

On Unix one need to avoid using special characters in file or directory names. Special characters are used by bash and have an alternative, non-literal meaning. White space: a tab, newline, vertical tab, form feed, carriage return, or space is one of them. Please see a list of special characters and avoid using them in file names.

7. Running applications

Cluster is a shared resource, at any given time there can be many users and hundreds of jobs running. What you do can adversely affect others. We use Slurm scheduler to run CPU intensive or long running applications.

Please follow simple rules of conduct to avoid problems:

  • Do not run computational jobs on login nodes. Login nodes are meant for light editing or short compilation and for submitting jobs.

  • Do not run Slurm jobs in your $HOME. Instead, use your DFS storage * for this: /pub/<account>.

  • Any job that runs for more than an hour or is using significant memory and CPU within an hour should be submitted to Slurm scheduler.

See details in Slurm submission sections below.

8. Slurm

Slurm is an open-source workload manager for Linux clusters and provides:

  1. access to resources (computer nodes) to users so they can run their applications.

  2. framework to start, execute, and monitor work on a set of allocated nodes.

  3. management of a queue for pending work.

HPC3 has different kinds of hardware, memory footprints, and nodes with GPUs. All nodes (servers) all are separated into groups according to their resources. Slurm uses the term partition to signify a queue of resources. We have a few separate partitions, most users will need to use standard and free partitions:

  • standard partition is for jobs that should not be interrupted. Usage is charged against the user’s Slurm bank account. Each user gets FREE one time allocation of 1000 core hours to run jobs here. Users are NOT CHARGED ANY $. If all allocation is used, users can run jobs only if they are associated with labs that have core hours in their lab banks. Usually, lab bank is a PI lab account.

  • free partition is for jobs that can be preempted (killed) by standard jobs. Users can run jobs in this partition even if they have only 1 core-hour left. There are no charges for this partition.

8.1. Slurm interactive job

To request an interactive job, use the srun command. Suppose you are enabled to charge to the panteater_lab account then, to start an interactive session you can use one of 3 methods :

[user@login-x:~]$ srun --pty /bin/bash -i                   (1)
[user@login-x:~]$ srun --pty -p free /bin/bash -i           (2)
[user@login-x:~]$ srun -A panteater_lab --pty /bin/bash -i  (3)
1 you will be put on an available node in standard partition using your default Slurm bank account
2 you will be put on an available node in free partition using your default Slurm bank account
3 you will be put on an available node in standard partition using panteater_lab account

Once you execute the command, you will be put by Slurm on a different node (not login node) and will see a new shell prompt in the terminal, for example:

[user@hpc3-l18-04:~]$

Now you can run your applications and commands from the command line.

After you are done use logout command to logout:

[user@hpc3-l18-04:~]$ logout

This will end your Slurm interactive session and you will return to the terminal window on the login node.

8.2. Slurm batch job

Slurm batch jobs can be submitted to the same queues as interactive jobs. A batch job is run at sometime in the future by the scheduler and the scheduler picks an available time and node. Usually, it is within minutes, or as soon as requested resources become available. Slurm balances resource usage among many users and many jobs.

A user needs to use sbatch command and a Slurm submit script.

Slurm submit script is a text file with a description of the job that user want to be executed and with the directives to Slurm what resources are needed.

In the steps below you will download an example Slurm script, python example script, submit slurm script to the scheduler and check the job output file. All commands are executed on the cluster and all files are downloaded from the web server to the filesystem that is allocated to you on the cluster. The Slurm script and python script don’t need editing after the download and can be used as is.

  1. Step: download an example batch script

    Type all 4 commands exactly as they are shown.

    [user@login-x:~]$ cd /pub/$USER                                        (1)
    [user@login-x:~]$ wget https://rcic.uci.edu/hpc3/examples/firstjob.sub (2)
    [user@login-x:~]$ wget https://rcic.uci.edu/hpc3/examples/days.py      (3)
    [user@login-x:~]$ cat firstjob.sub                                     (4)
    1 use cd command to descend to your DFS allocation area, here $USER is a shortcut for your UCNetID.
    2 use wget command to download the example Slurm submit script and save it as firstjob.sub file
    3 use wget command to download the example python script and save it as days.py file. It is a simple python program that prints today’s date and a random day 1-365 days in the past.
    4 use cat command to show the content of the Slurm script in the Terminal window.
  2. Step: submit job to Slurm scheduler

    [user@login-x:~]$ sbatch firstjob.sub
    Submitted batch job 5776081

    The output shows that script was submitted as a job with ID 5776081. All job IDs are unique, yours will be different and the output file name of your job will reflect a different ID.

  3. Step: Check the job status and output file

    This test job will run very quickly (fraction of a second) because it executes a few very fast commands and has no computational component.

    [user@login-x:~]$ squeue -u $USER           (1)
    JOBID   PARTITION   NAME  USER  ACCOUNT ST   TIME  CPUS NODE NODELIST(REASON)
    
    [user@login-x:~]$ ls                       (2)
    firstjob.5776081.err  firstjob.5776081.out  firstjob.sub
    
    [user@login-x:~]$ cat firstjob.5776081.out (3)
    Running job on host hpc3-l18-05
    Today is 2021-07-23 and 325 days ago it was 2020-09-01
    1 use squeue command to check the status of your job, $USER is a shortcut for your UCNetID. When the output shows a single line as shown, the job is finished, otherwise there will be info about your job in the output.
    2 use ls command to list the files in the current directory. There will be 2 additional files listed. These are error/output files produced by the Slurm job as was requested in the submit script.
    3 use cat command to show the contents of the output file in the Terminal window. Here the text shows the output of the commands that were submitted with the firstjob.sub submit script.

9. Logout

After you are done with your work on the cluster you need to logout:

[user@login-x:~]$ logout

10. Storage

The filesystem storage is generally in 3 areas. Please see the links below for detailed information about each filesystem.

HOME

The HOME area has a 50GB quota for each user. In addition, there is a space for snapshots. Total for home and snapshots is 100GB. Each user HOME is in /data/homezvolX/<account>

DFS

The BeeGFS Parallel storage File System (DFS) access remains the same. All users have /pub/<account> area. Depending on a lab affiliation, users may have space in /dfs2, /dfs3a, /dfs3b, /dfs4, /dfs5 and /dfs6.

CRSP

The Campus Research Storage Pool (CRSP) is available in /share/crsp. Depending on a lab affiliation, users may have space in /share/crsp/lab/<labname./<account>

10.1. Storage quotas

In summary, a user with UCINetID panteater has read and write access to:

/data/homezvol/pantetear

HOME quota 50Gb, use it for storing important and rarely changed files

/pub/pantetear

DFS user quota is 1Tb, use it for storing Data sets, documents, Slurm scripts and jobs input/output

Check quotas on regular basis after adding or removing a lot of files, transferring data or running computational jobs:

10.2. Data transfers

Often users need to brings data from other servers and laptops. To transfer data one needs to use scp (secure copy) or rsync (file copying tool). Please see detailed data transfer examples. Alternatively, one can use graphical tools (Filezilla, MountainDuck, or WinSCP) to transfer files between a local laptop and the cluster. Follow each program instructions how to do this.

In all of the transfer application you will need to use hpc3.rcic.uci.edu to indicate a remote server (where you want to transfer your files) and use your UCNetID credentials for your user name and password.

Simple examples of file transfers with scp:

The scp command is used to transfer files and directories between a local laptop and a remote server. The command has a simple structure:

scp OPTIONS SOURCE DESTINATION

We omit OPTIONS for now, they are not needed in simple cases. The SOURCE and DESTINATION may be specified as a local file name, or a remote host with path name in the form made of 3 parts user@server:path and the parts mean:

  • user your UCNetID (or account name on a cluster)

  • @server: is the server name delimited with special characters, where @ is a special character that separates user name from server name and : is a special character that separates server name from path name

  • path is a file path name on the server

File path names can be made explicit using absolute or relative names, for example /Users/panteater/project1/input/my.fasta is an absolute name, and the same file can be referred to as my.fasta which is a relative file name when used from the directory where this file is located.

Examples below use UCnetID panteater, use your UCnetID credentials (username and password).

  1. To transfer a single file myfile.txt from your laptop to HPC3 and put it in the directory /pub/panteater

    On your laptop, use a Terminal app and descend into the directory where your file is located, then execute the scp command using your UCnetID:

    [user@login-x:~]$ scp myfile.txt panteater@hpc3.rcic.uci.edu:/pub/panteater/myfile.txt
  2. To transfer a single file j-123.fa from HPC3 to your laptop

    On your laptop, use a Terminal app and descend into the directory where you want to transfer the file, then execute the scp command using your UCnetID.

    [user@login-x:~]$ scp panteater@hpc3.rcic.uci.edu:/pub/panteater/project1/j-123.fa j-123.fa
  3. To transfer multiple files from your laptop to HPC3:

    [user@login-x:~]$ scp f1.py f2.py doc.txt panteater@hpc3.rcic.uci.edu:/pub/panteater
  4. To transfer all files from HPC3 from the /pub/panteater/results/ directory to your laptop to the directory where the command is executed and creating results/ directory with its contents locally on your laptop. Note the dot at the end means copy to this current directory.

    [user@login-x:~]$ scp -r panteater@hpc3.rcic.uci.edu:/pub/panteater/results .