
Slurm Guide for HPC3
1. Overview
HPC3 will use the Slurm scheduler. Slurm is used widely at super computer centers and is actively maintained. Many of the concepts of SGE are available in Slurm, Stanford has a guide for equivalent commands. There is a nice quick reference guide directly from the developers of Slurm.
We provide the numerous EXAMPLES that show in depth how to run array jobs, request GPUs, CPUs, and memory for a variety of different job types and common applications.
1.1. Simple code of conduct
-
All jobs, batch or interactive must be submitted to the scheduler
-
Do not run computational jobs on login nodes this adversely affects many users. Login nodes are meant for light editing or compilation and for submitting jobs. Any job that runs for more than an hour or is using significant memory and CPU within an hour should be submitted to Slurm either as interactive or batch job.
-
Long-running jobs will be removed.
-
We reserve the right to limit access for the users who abuse the system.
-
-
Ssh access to the compute nodes is turned off to prevent users from starting jobs bypassing Slurm. See attaching to running job below.
-
Do not run Slurm jobs in your $HOME.
-
Make sure you stay within your disk quota. File system limits are generally the first ones that will negatively affect your job. See storage guides
1.2. HPC3 Queue Structure
Slurm uses the term partition to signify a batch queue of resources. HPC3 has different kinds of hardware, memory footprints, and nodes with GPUs. Jobs running in some queues will charge core-hours (or GPU-hours) to the account.
Please do not override the memory defaults unless your particular job really requires it. Analysis of more than 3 Million jobs on HPC3 indicated that more than 98% of jobs fit within the defaults. With slightly smaller memory footprints, the scheduler has MORE choices as to where to place jobs on the cluster. |
Memory is allocated per-CPU core. When you request more cores, your job is allocated more memory.
Partition | Default memory/core1 | Max memory/core | Default / Max runtime | Cost | Jobs preemption |
---|---|---|---|---|---|
CPU Partitions |
|||||
standard |
3 GB |
6 GB |
2 day / 14 day |
1 / core-hr |
No |
free |
3 GB |
18 GB |
1 day / 3 day |
0 |
Yes |
debug |
3 GB |
18 GB |
15 min / 30 min |
1 / core-hr |
No |
highmem2 |
6 GB |
10 GB |
2 day / 14 day |
1 / core-hr |
No |
hugemem2 |
18 GB |
18 GB |
2 day / 14 day |
1 / core-hr |
No |
maxmem2,3 |
1.5 TB/node |
1.5 TB/node |
1 day / 2 day |
40 / node-hr |
No |
GPU Partitions |
|||||
gpu4 |
3 GB |
9 GB |
2 day / 14 day |
1 / core-hr, 32 / GPU-hr |
No |
free-gpu5 |
3 GB |
9 GB |
1 day / 3 day |
0 |
Yes |
gpu-debug5 |
3 GB |
9 GB |
15 min / 30 min |
1 / core-hr, 32 / GPU-hr |
No |
1 In this table read 1 GB = 1000 MB
2 You must be added to a specific group to access the highmem / hugemem/ maxmem partitions.
If you are not a member of these groups then you will not be able to submit jobs to these
partitions and sinfo
will not show these partitions.
3 The maxmem partition is a single 1.5 TB node and that is reserved for those rare applications that really require that much memory. You can only be allocated the entire node. No free jobs run in this partition.
4 You must have a gpu account and you must specify it in order to submit to the gpu/gpu-debug partitions. This is because of differential charging. GPU accounts are not automatically given to everyone, your faculty adviser can request a GPU lab account.
5 Anyone can run jobs in free-gpu partition without special account.
1.3. How accounts are charged
Every HPC3 user is granted 1,000 free CPU hours as a startup allowance. This allocation is there for you to become familiar with HPC3, Slurm scheduler, and accounting.
Each CPU core-hour (allocation unit) is charged to the account you specify (or your default account which is your user account). Similarly, each GPU hour is charged to a gpu-enabled account at the rate of 32/GPU-hour plus the core-hour charge of the cpu cores the job utilizes (at least one is required). A GPU-enabled account can only be used on GPU-nodes.
Using | Will be charged |
---|---|
1 core X 1 hour |
1 units |
1 core X 6 minutes |
0.1 units |
10 cores X 1 hour |
10 units |
( 1 GPU-core + 1 CPU core ) X 1 hour |
33 units |
Most jobs ran on HPC3 are charged to a lab account because most HPC3 users are part of at least one research lab. If a user or a research lab runs out of CPU hours, more CPU hours can be purchased via recharge.
Any UCI Professor can request an HPC3 lab account on behalf of their research group and add any number of researchers/students to this account. Based upon the number of requests and the number of nodes that have been purchased by RCIC, this number will vary. The aspirational goal is faculty who request an account will be granted 200,000 CPU hours per fiscal year.
You may request your Slurm lab account by emailing hpc-request@uci.edu. In the email, please specify
-
PI user name
-
User names (if any) of the researchers, graduate students or other collaborators that should be able to charge CPU hours to the lab account.
-
Define account Coordinators - one or two lab members (typically Postdocs and/or Project Specialists) for the Slurm lab account. Account Coordinators are able to manage the group members' jobs, modify their queue priority, update limits for the total CPU hours for individual members, etc.
1.3.1. Free Jobs
The free queues are designed to allow the cluster to run at 100% utilization, but allow allocated jobs to still have very quick access to cores. This is accomplished by allowing allocated jobs to displace (kill) a running free job. The design of HPC3 is that, on average, about 20% of the cluster is available for free jobs.
-
Jobs submitted to free partitions are not charged to an account
-
Free jobs can be killed at any time to make room for allocated jobs
When time the standard partition, where paid for jobs run, becomes full, jobs in free partition are killed in order to allow the allocated jobs to run with a priority. In an attempt to get as much goodput through the system, the most-recently started free jobs are the ones to be killed first
1.3.2. Allocated Jobs
-
Jobs submitted to the standard partition are allocated jobs
-
Once a job started running in the standard queue it will run to completion
Standard jobs have the following properties:
-
Standard jobs cannot be killed preemptively by any other job.
-
Standard jobs preempt free jobs.
-
Standard jobs with QOS set to normal are charged for the CPU time consumed.
-
Standard jobs with QOS set to high are charged double the CPU time consumed.
-
Standard Jobs with QOS set to high are placed at the front of the jobs queue. They are meant to be used when a user needs to jump in front of the queue when the time from submission to running is of the essence (i.e. grant proposals and paper deadlines).
1.3.3. Recommendations
Charging jobs to an account is new for the UCI community. Like any policy, it can be two-edged. A large fraction of users should be able to run allocated jobs and never see the limits of their accounts. However, users who are accessing a very large number of free cores are likely to have some of their free jobs preempted (killed).
Get the most from your allocation:
-
Look at your past jobs and see how much CPU resource was used. Don’t request more than needed.
-
Prioritize your own work. Test and low-priority jobs can go to free. Others should be allocated.
-
Understand that free "comes with no guarantees". Your free job can be killed at anytime.
1.3.4. Quota Enforcement
When HPC3 users exceed their disk space or CPUs quota allocations the following will happen:
-
users will not be able to submit new jobs
-
running jobs will fail
Please check the available disk space and CPU hours in your Slurm account regularly. Delete or archive data as needed.
2. Quick Start
2.1. Example Scripts
We provide the numerous EXAMPLES that show in depth how to run array jobs, request GPUs, CPUs, and memory for a variety of different job types and common applications. The scripts can be downloaded from this directory.
There are a few methods to submit your jobs to Slurm: batch, interactive, and running jobs immediately. The sections below show a few common submission details.
2.2. Batch Job
A batch job is run at sometime in the future by the scheduler.
Submitting batch jobs to Slurm is accomplished through the sbatch
command
and the job description is provided by the submit script.
An example job description is shown below:
#!/bin/bash
#SBATCH --job-name=test ## Name of the job.
#SBATCH -A panteater_lab ## account to charge (1)
#SBATCH -p standard ## partition/queue name
#SBATCH --nodes=1 ## (-N) number of nodes to use
#SBATCH --ntasks=1 ## (-n) number of tasks to launch
#SBATCH --cpus-per-task=1 ## number of cores the job needs
#SBATCH --error=slurm-%J.err ## error log file
# Run command hostname and save output to the file out.txt
hostname > out.txt
To submit a job on HPC3, login and using your favorite editor create simplejob.sub file with the contents shown above.
1 | Edit the Slurm account to charge for the job to either your personal account or lab account. Your personal account is the same as your UCINetID. |
To submit the job do:
[user@login-x:~]$ sbatch simplejob.sub Submitted batch job 362
When the job has been submitted, Slurm returns a job ID that will be used to reference the job in Slurm user log files and Slurm job reports. After the job is finished look at the file out.txt to see the name of the compute node that ran the job.
2.3. Job Status
To check the status of your job in the queue:
[user@login-x:~]$ squeue -u panteater JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 362 standard test panteater R 0:03 1 hpc3-17-11
To get detailed info about the job:
[user@login-x:~]$ scontrol show job 382
The output will contain a list of key=value pairs that provide job information.
AVOID using command watch to query the Slurm queue in a continuous loop as shown in:
|
[user@login-x:~]$ watch -d squeue <...other arguments...>
This frequent querying of Slurm queue creates an unnecessary overhead
and affects many users. Instead, check your job output and use
mail notification for the job end or use the squeue
command
when you want to see an update.
2.4. Immediate job
The command srun
is used to run a job immediately and uses your console
for stdin / stdout / stderr (standard input/output/error) instead of redirecting
them to a file.
Srun submits jobs for immediate execution but it does not bypass scheduler priority. If your job cannot run immediately, you will wait until Slurm can schedule your request.
The main difference between srun and sbatch:
-
srun is interactive and blocking. Srun is quite useful in the debug or free queues. Srun is often used to create job steps in sbatch scripts or to run interactive jobs.
-
sbatch is batch processing and non-blocking. Sbatch can do everything srun can and more.
2.5. Interactive Job
To request an interactive shell, use the salloc
command. For example, a user
panteater can use one of the following to submit a job to a standard
partition:
[user@login-x:~]$ salloc srun --pty /bin/bash -i (1) [user@login-x:~]$ salloc -A PI_LAB --ntasks=4 srun --pty /bin/bash -i (2)
1 | get an interactive node reserving 1 CPU (default), and panteaer account (default). |
2 | get an interactive node reserving 4 CPUs, use a PI_LAB account. Note, salloc creates a resource allocation and this is why options for the account, partition, CPU, GPU, etc must be entered before srun in the command arguments order. The srun is used to run a command, the options needed for the command need to be specified after srun. In this case, srun requests a pseudo terminal to execute /bin/bash command and redirects all stdin to a terminal. |
A simpler way to run an interactive is to use srun
. For example:
[user@login-x:~]$ srun -A PI_LAB --pty /bin/bash -i (1) [user@login-x:~]$ srun -p free --pty /bin/bash -i (2) [user@login-x:~]$ srun --mem=8G -p free --pty /bin/bash -i (3) [user@login-x:~]$ srun -c 4 --time=10:00:00 -N 1 --pty /bin/bash -i (4)
1 | start an interactive session in standard partition and charge to the PI_LAB account |
2 | start interactive session in free partition (where it may be killed at any time) |
3 | ask for 8Gb of memory per job (when you truly need it) |
4 | ask for 4 CPUs for 10 hrs |
Once the salloc
or srun
command is executed, the scheduler allocates
available resource and starts an interactive shell on the chosen node.
Your shell prompt will indicate a new hostname.
Once done with your work simply type at the prompt:
[user@login-x:~]$ exit
2.6. Interactive GUI job
To run an interactive session for GUI jobs a user must login with
Xforwarding enabled in ssh, see Reference guide
and then use the --x11
to enable Xforwarding in srun command.
[user@login-x:~]$ srun -p free --x11 --pty /bin/bash -i
2.7. Attach to a job
The ssh access to compute nodes is turned off
Users will need to use Slurm commands to attach to running jobs if they want to run simple jobs verification commands on the node where their job is running.
-
For example, find your running job and use its jobid to attach to it:
[user@login-x:~]$ squeue -u panteater JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 3559123 free Tst41 panteater R 17:12:33 5 hpc3-14-02 3559124 free Tst42 panteater R 17:13:33 7 hpc3-14-17,hpc3-15-[05-08] [user@login-x:~]$ srun --pty --jobid 3559123 /bin/bash
This will put a user on the same node hpc3-14-02 where the job is running and will run inside the cgroup (CPU, RAM etc.) of the running job. This means the user will be able to execute simple commands such as ls, top, ps, etc., but will not be able to start new processes that use resources outside of what is specified in $JOBID. Any command will use computing resources, and therefore will add to the usage of the job. After executing your desired verification commands simply type exit. The original job will be still running.
For jobs with multiple nodes use the -w switch to specify the specific node:
[user@login-x:~]$ srun --pty --jobid 3559124 -w hpc3-14-17 /bin/bash
-
Most often users just need to see the processes of the job, etc. Such commands can be run directly as for example:
[user@login-x:~]$ srun --pty --jobid $JOBID top
2.8. Email notification
To receive email notification on the status of jobs, include the following lines in your submit scripts and make the appropriate modifications to the second line:
#SBATCH --mail-type=fail,end #SBATCH --mail-user=user@domain.com
The first line specifies the event type for which a user requests an email (failure/exception events), the second specifies a valid email address. We suggest to use a very few event types especially if you submit hundreds of jobs. See output of man sbatch command for more info.
2.9. Node Selection Using Constraints
HPC3 is heterogeneous hardware with several different CPU types available. In Slurm you can request that a job only run on nodes with certain "features". We add features to assist users who have discussed their specific needs. Using a constraint is straightforward in Slurm. For example, to run on only nodes that support the AVX512 instruction set, one would add to the submission:
#SBATCH --constraint=avx512
We have defined the following features for node selection:
Feature Name | Description (processor/storage) | Node Count | Cores (Min, Mode, Max) |
---|---|---|---|
intel |
Select Intel node (including HPC legacy) |
compute: 162 |
compute: 24, 40, 48 |
avx512 |
Select Intel node supporting AVX512 instructions |
compute: 157 |
compute: 28, 40, 48 |
epyc or amd |
Select AMD EPYC node |
22 |
40, 64, 64 |
epyc7551 |
Select AMD EPYC 7551 node only |
18 |
40, 64, 64 |
epyc7601 |
Select AMD EPYC 7601 node only |
4 |
64, 64, 64 |
nvme or fastscratch |
Select Intel AVX512 node only with /tmp on NVMe |
40 |
48, 48, 48 |
2.10. Default Settings
2.10.1. Node Information
Sinfo
command provides information about nodes and partitions.
A few useful examples:
-
The following command will give a short table where nodes are grouped per their features (output trimmed):
[user@login-x:~]$ sinfo -o "%60N %10c %10m %30f %10G" -e NODELIST CPUS MEMORY AVAIL_FEATURES GRES hpc3-14-[00-31],hpc3-15-[00-19],... 40 180000 intel,avx512 (null) hpc3-19-16 44 500000 intel (null) hpc3-20-[23,25-32] 48 180000 intel,avx512 (null) hpc3-l18-[04-05] 28 245000 intel,avx512 (null) hpc3-18-02 64 244000 amd,epyc,epyc7601 (null) hpc3-19-[07,17],hpc3-l18-03 64 500000 amd,epyc,epyc7551 (null) hpc3-21-[00-32],hpc3-22-[00-04] 48 180000 intel,avx512,fastscratch,nvme (null) hpc3-20-[16-20,24] 48 372000 intel,avx512 (null) hpc3-gpu-16-00 40 180000 intel,avx512 gpu:V100:4 hpc3-l18-02 40 1523544 amd,epyc,epyc7551 (null) hpc3-gpu-16-[01-07],hpc3-gpu-17-... 40 180000 intel,avx512 gpu:V100:4 hpc3-gpu-18-00 40 372000 intel,avx512 gpu:V100:4
-
The following command will produce the same table but for each node without grouping:
[user@login-x:~]$ sinfo -o "%20N %10c %10m %20f %10G" -N NODELIST CPUS MEMORY AVAIL_FEATURES GRES hpc3-14-00 40 192000 avx512 (null) hpc3-14-00 40 192000 avx512 (null) hpc3-14-01 40 192000 avx512 (null) hpc3-14-01 40 192000 avx512 (null) ... output cut ...
-
This command will give an output just for one specified node:
[user@login-x:~]$ sinfo -o "%20N %10c %10m %20f %10G" -n hpc3-14-00 NODELIST CPUS MEMORY AVAIL_FEATURES GRES hpc3-14-00 40 192000 avx512 (null)
Run man sinfo command for detailed information about options.
2.10.2. Node Memory
There are nodes with three different memory footprints. Slurm uses Linux cgroups to enforce that applications do not use more memory/cores than they have been allocated.
Some nodes have Graphics Processing Units (GPUs) and these are defined in separate queues Please see the default settings for Slurm partitions.
Users cannot submit jobs to highmem/hugemem without first being added to special groups. User must be either (a) member of a group that purchased these node types or (b) demonstrate that their applications require more than standard memory. There is no difference in cost/core-hour on any of the CPU partitions. |
If you want more memory on a standard memory node, you should request more cores. You will be charged more for this, but you use a larger fraction of the node. |
2.10.3. Queue configuration
Command scontrol
can be used to view Slurm configuration including: job, node, partition, reservation,
and overall system configuration. For example, to display information about a standard queue:
[user@login-x:~]$ scontrol show partition=standard PartitionName=standard AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL AllocNodes=ALL Default=YES QoS=normal DefaultTime=2-00:00:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO MaxNodes=65 MaxTime=14-00:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=40 Nodes=hpc3-14-[00-31],hpc3-15-[00-19,21,24-31],hpc3-17-[08-11] PriorityJobFactor=100 PriorityTier=100 RootOnly=NO ReqResv=NO OverSubscribe=NO OverTimeLimit=NONE PreemptMode=OFF State=UP TotalCPUs=2600 TotalNodes=65 SelectTypeParameters=NONE JobDefaults=(null) DefMemPerCPU=4500 MaxMemPerCPU=4500
The output contains information about default configuration settings. The key Nodes lists all the nodes for the queue. We can get more information about a specific node:
[user@login-x:~]$ scontrol show node=hpc3-14-00
Please run man scontrol command for available options.
3. Monitor jobs
3.1. Job History
We have a clsuter-specifc tool to print a ledger of jobs based on specified arguments. Default is to print jobs of the current user for the last 30 days:
[user@login-x:~]$ /pub/hpc3/zotledger -u panteater DATE USER ACCOUNT PARTITION JOBID JOBNAME ARRAYLEN CPUS WALLHOURS SUs 2021-07-21 panteater panteater standard 1740043 srun - 1 0.00 0.00 2021-07-21 panteater panteater standard 1740054 bash - 1 0.00 0.00 2021-08-03 panteater lab021 standard 1406123 srun - 1 0.05 0.05 2021-08-03 panteater lab021 standard 1406130 srun - 4 0.01 0.02 2021-08-03 panteater lab021 standard 1406131 srun - 4 0.01 0.02 TOTALS - - - - - - - 0.07 0.09
To find all available arguments use:
[user@login-x:~]$ /pub/hpc3/zotledger -h
3.2. Job Info
sacct
can be used to see accounting data for all jobs and job steps. An example
below shows how to use job ID for the command:
[user@login-x:~]$ sacct -j 43223 JobID JobName Partition Account AllocCPUS State ExitCode ------------ -------- ---------- ------------ ---------- ---------- -------- 36811_374 array standard panteater_l+ 1 COMPLETED 0:0
The above command uses a default output format. A more useful example will set a specific format for sacct that provides extra information:
[user@login-x:~]$ export SACCT_FORMAT="JobID%20,JobName,Partition,Elapsed,State,MaxRSS,AllocTRES%32" [user@login-x:~]$ sacct -j 600 JobID JobName Partition Elapsed State MaxRSS AllocTRES ---------- ------- -------- -------- --------- ------- -------------------------------- 600 all1 free-gpu 03:14:42 COMPLETED billing=2,cpu=2,gres/gpu=1,mem=+ 600.batch batch 03:14:42 COMPLETED 553856K cpu=2,mem=6000M,node=1 600.extern extern 03:14:42 COMPLETED 0 billing=2,cpu=2,gres/gpu=1,mem=+
the MaxRSS value shows your job memory usage. |
other useful options in SACCT_FORMAT are User, NodeList, ExitCode.
To see all available options for the format see man page man sacct
|
Slurm efficiency script seff
can be used after the job completes,
to find useful information about the job including the memory and CPU use and efficiency.
[user@login-x:~]$ seff -j 4322385
seff doesn’t produce accurate results for multi-node jobs. Use this command for single node jobs. |
3.3. Job Statistics
sstat
displays various running job and job steps resource utilization information.
For example, to print out a job’s average CPU time use (avecpu), average number of bytes written by all tasks
(AveDiskWrite), average number of bytes read by all tasks (AveDiskRead),
as well as the total number of tasks (ntasks) execute:
[user@login-x:~]$ sstat -j 125610 --format=jobid,avecpu,aveDiskWrite,AveDiskRead,ntasks JobID AveCPU AveDiskWrite AveDiskRead NTasks ------------ ---------- ------------ ------------ -------- 125610.batch 10-18:11:+ 139983973691 153840335902 1
3.4. Allocations
sbank
is short for "Slurm Bank".
Sbank
is used to display HPC3 user account information.
In order to run jobs on HPC3, a user must have available CPU hours.
To check how many CPU hours are available in your personal account, run
the command with your account name:
[user@login-x:~]$ sbank balance statement -a panteater User Usage | Account Usage | Account Limit Available (CPU hrs) ---------- --------- + -------------- --------- + ------------- --------- panteater* 58 | PANTEATER 58 | 1,000 942
To check how many CPU hours are available in all accounts that you have access to and how much you used:
[user@login-x:~]$ sbank balance statement -u panteater User Usage | Account Usage | Account Limit Available (CPU hrs) User Usage | Account Usage | Account Limit Available (CPU hrs) ---------- --------- + -------------- --------- + ------------- --------- panteater* 58 | PANTEATER 58 | 1,000 942 panteater* 6,898 | PI_LAB 6,898 | 100,000 93,102
The panteater* in the output means the command was run by user panteater. The user name is not reflected in the generic prompt [user@login-x:~]$ used for these examples. |
3.5. Pending Job
Once you submit your job it should start running depending on the availability of the nodes, job priority and job resources. However, sometimes job is in PD (pending) status for a long time. Here is how to determine why.
-
Check the queue status for your jobs
[user@login-x:~]$ squeue -u panteater JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 1666961 standard tst1 panteater PD 0:00 1 (AssocGrpCPUMinutesLimit) 1666962 standard tst2 panteater PD 0:00 1 (AssocGrpCPUMinutesLimit)
Note, in this case the reason is AssocGrpCPUMinutesLimit which means there is not enough balance left in the account.
-
Check your account balance
[user@login-x:~]$ sbank balance statement -u panteater User Usage | Account Usage | Account Limit Available (CPU hrs) ---------- ---------- + -------------- --------- + ------------- --------- panteater * 0 | PANTEATER 0 | 1,000 1,000
The account has 1000 hours in the account.
-
Check the jobs request
[user@login-x:~]$ squeue -o "%i %u %j %C %T %L %R" -p standard -t PD -u panteater JOBID USER NAME CPUS STATE TIME_LEFT NODELIST(REASON) 1666961 panteater tst1 16 PENDING 3-00:00:00 (AssocGrpCPUMinutesLimit) 1666962 panteater tst2 16 PENDING 3-00:00:00 (AssocGrpCPUMinutesLimit)
Each jobs asks for 16 CPUs to run for 3 days which is 16 * 24 * 3 = 1152 core-hours, and is more than 1000 in the account balance.
This job will never be scheduled to run and needs to be removed from the queue.
3.5.1. Pending Job Reasons
Jobs submitted to Slurm will start up as soon as the scheduler can find an appropriate resource match. While lack of
resources or unsufficient account balance are common reasons that prevent a job from starting
(technically, the job is in the "pending" state), there are
other possibilities. From a policy point of view, RCIC does not generally put limits in place unless we see access,
unreasonable impact to shared resources (often, file systems), or other fairness issues. To see the state reasons of
your pending jobs, you can issue the squeue
command as in the following
example. The $(whoami) used in the command below is equivalent to your user
account name, you can simply use that.
[user@login-x:~]$ squeue -t PD -u $(whoami) JOBID PARTITION NAME USER ACCOUNT ST TIME CPUS NODE NODELIST(REASON) 92005 standard watA peat p_lab PD 0:00 1 1 (ReqNodeNotAvail,Reserved for maintenance) 92008 standard watA peat p_lab PD 0:00 1 1 (ReqNodeNotAvail,Reserved for maintenance) 92011 standard watA peat p_lab PD 0:00 1 1 (ReqNodeNotAvail,Reserved for maintenance) 95475 free-gpu 7sMD peat p_lab PD 0:00 2 1 (QOSMaxJobsPerUserLimit) 95476 free-gpu 7sMD peat p_lab PD 0:00 2 1 (QOSMaxJobsPerUserLimit)
Reasons that are often seen on HPC3 for job pending state
NODELIST(REASON) | Explanation |
---|---|
(AssocGrpCPUMinutesLimit) |
Insufficient funds are available to run the job to completion. |
(Dependency) |
Job has a user-defined dependency on a running job and cannot start until the previous job has completed. |
(Priority) |
Slurm’s scheduler is temporarily holding the job in pending state because other queued jobs have a higher priority. |
(QOSMaxJobsPerUserLimit) |
The user is already running the maximum number of jobs allowed by the particular partition. |
(ReqNodeNotAvail, Reserved for maintenance) |
If the job were to run for the requested maximum time, it would run into a defined maintenance window. Job will not start until maintenance has been completed. |
(Resources) |
The requested resource configuration is not currently available. If a job requests a resource combination that physically does not exist, the job will remain in this state forever. |
A job may have multiple reasons for not running, squeue will only show one of them. |
3.5.2. Fix pending job
You will need to resubmit the job so that the requested execution hours can be covered by your bank account balance. Verify the following settings in your Slurm script for batch jobs:
-
#SBATCH -A use a different Slurm account (lab) where you have enough balance
-
#SBATCH -p free use free partition if you don’t have another account
-
#SBATCH --ntasks or #SBATCH --cpus-per-task are you requesting correct CPU
-
#SBATCH --mem or #SBATCH --mem-per-cpu are you requesting correct memory
-
#SBATCH --time set a time limit that is shorter than the default runtime (see the default settings )
Similar fixes apply when using srun
for interactive jobs.
See EXAMPLES for more info
4. Modify jobs prior to execution
It is possible to make some changes to jobs that are still waiting to run in the
queue by using the scontrol
command.
If changes need to be made for a running job, it may be better to kill the job
and restart it after making the necessary changes.
[user@login-x:~]$ scontrol update jobid=<jobid> timelimit=<new timelimit> (1) [user@login-x:~]$ scontrol update jobid=<jobid> qos=[low|normal|high] (2)
1 | change time limit. The format set is minutes, minutes:seconds, hours:minutes:seconds, days-hours, days-hours:minutes or days-hours:minutes:seconds. The 2-12:30 means 2days, 12hrs, and 30 min. |
2 | change QOS. By default, jobs are set to run with qos=normal. Users rarely need to change QOS. |
5. Hold/Release/Cancel jobs
[user@login-x:~]$ scontrol hold <jobid> (1) [user@login-x:~]$ scontrol release <jobid> (2) [user@login-x:~]$ scancel <jobid> (3) [user@login-x:~]$ scancel -u <username> (4)
1 | To prevent a pending job from starting. |
2 | To release held jobs to run (after they have accrued priority). |
3 | To cancel a specific job. |
4 | To cancel all jobs owned by a user. This only applies to jobs that are associated with your accounts. |
6. Account Coordinators
Slurm uses account coordinators as users who can directly control accounts. Please see Account Coordinators Guide
7. Quick Links
Please see guides below that provide more information and explain how to get help and how to use HPC3: