1. Frequently Asked Questions

1.1. HPC3 general 

1.1.1. Who can have an account?

Anyone with a valid UCInetID. Please see Getting an account.

1.1.3. How long can I have an account after graduation/separation from UCI?

For as long as your UCInetID is valid. Please see Closing an account.

1.1.4. Who defined the policy?

The HPC3 subcommittee of the RCIC advisory committee crafted the initial policy.

The RCIC Advisory committee approved the policy

Please see Advisory Committees

1.1.5. How do I acknowledge RCIC?

How to Acknowledge RCIC

1.1.6. Is there a description that can be used in grant applications?

Grant Support

1.1.8. I don’t have any funds to buy cycles or hardware, can I use HPC3?

Yes, if you are faculty member, you can have granted cycles that are yours to use anyway you see fit for research. There are also the free queues, where jobs are not charged. Please see No-cost for details.

1.1.9. If I purchase core-hours, is overhead charged?

We are actively working with UCI financial office to see if we can establish a rate that reduces the financial impact of overhead on recharged-cycles.

1.2. Accounting 

1.2.1. How can I be added to my PI’s Slurm lab account?

PIs have control over who can charge to their account and how much they can charge.

Please send a request to hpc-support@uci.edu with a cc to your PI and ask us to add you to the PI’s account.

The PI must confirm via email reply to your cc that this change is allowed.

1.2.2. How do I prevent my grad student from draining my account?

Submit a ticket and ask us to set up charge limits for any particular user. If students hit their limits, they will have to ask you for more, or use the free queue.

1.2.3. Will HPC3 allow long-running (multi-day/multi-week) jobs?

Yes. It is clear that substantial community of researchers requires this feature.

1.2.4. How does core-hour accounting impact long-running jobs?

It doesn’t. Slurm will not start the job unless there is enough credit to start the job. For example, a job is submitted with a requirement of 16 core-weeks will not start unless an account has $16 * 7 * 24 = 2688$ core hours.

1.2.5. Why is my account reallocation less than 100,00 core-hours?

The no-cost reallocation is calculated every 6 months based on the lab’s previous usage. Please see details in Granted hours.

1.2.6. When is the next allocation for my Slurm lab account?

Reallocation schedule is available via account balance command.

1.3. Files storage and transfer 

1.3.1. Can I get a copy of the other student data?

Please see What happens to data when an account is closed.

1.3.2. Where can I store store many files, some are large?

Depending on your lab affiliation and how much space your lab has purchased you may have an access to personal and group-access areas in CRSP and DFS file systems. See DFS and CRSP for information where to store and how to check quotas.

1.3.3. How do I backup important files?

It depends on what filesystem you are using:

$HOME:

has automatic snapshots, you don’t need to do anything special. Please read ZFS Snapshots for details.

CRSP:

Your $HOME and LAB areas have automatic snapshots, you don’t need to do anything special. Please see Snapshots.

DFS:

You can use Selective Backup.

1.3.4. How do I transfer files between a remote server and my directory?

Please see Data transfer.

1.3.5. How do I use FileZilla with DUO?

Please see Using FileZilla and DUO.

1.3.6. Can accidentally deleted files or directories be restored?

First, It depends on the time between the file creation and file deletion. If a file was created or changed and there was a backup after that then it you can use snapshots to restore files and directories provided that existing snapshots still hold the desired data.

The restoration method depends on where the files was originally located. Please see respective guides for recovery explanation:

$HOME files

DFS files

CRSP files

1.4. DFS 

1.4.1. What are allocations for DFS?

Users have access to the private and group-shared areas on DFS:

* private: Private areas are in /pub/$USER and are for user only, not shared with anyone.

* group-shared: UCI Faculty members can have low-cost recharge allocation(s) and have areas to fulfill their needs where they can add group members to the access.

Please see Resource Allocations for details.

1.4.2. How do I purchase more DFS space?

Please see Purchase DFS storage.

1.4.3. I want to use my PI’s group DFS area, how do i do this?

If your PI already has a group DFS area you need to submit a ticket to hpc-support@uci.edu requesting to be be added to a specific group for a specific DFS filesystem access, with a cc to your PI.

Your PI must confirm via email reply to your cc that this change is allowed.

1.5. CRSP 

1.5.1. I want to use my PI’s group CRSP area, how do i do this?

Please see Getting CRSP Account

1.5.2. Exactly who is entitled to my CRSP baseline allocation?

All ladder faculty and any UCI employee who can serve as PI or Co-PI on an extramural grant. Please see Resource Allocations.

1.5.3. How do I purchase more space?

Please see Purchase CRSP storage

1.5.4. Can I expand space more than once?

Yes. We track when each of your space allocations expire and recharge appropriately.

Multiple purchases can be used to expand your space.

1.5.5. Can the recharge be used to expand my baseline allocation?

You will always have your baseline allocation and you can use recharge to buy more space.

For example, if you were to purchase 10TB for 1 year ($600) and add it to your baseline, you will have 11TB of allocated space.

Please see how to Purchase CRSP storage and Recharge for pricing.

1.5.6. Can I grant access to my CRSP storage to others at UCI?

Yes. Under your control. You can submit a ticket and ask us to add people (by the UCInetID) to have read, write or read/write access to your storage.

1.5.7. Can I grant access to my storage to others outside of UCI?

You will need to sponsor a UCInetID for your external collaborators. They will then be able to access CRSP using normal mechanisms. Please see Access.

1.5.8. Can do I add several students/postdocs to my Lab space?

Please see Getting CRSP Account

1.5.9. Can departments purchase CRSP space to store business data?

No. CRSP is designed and funded for research data. Storing non-research data will compromise CRSP status as research equipment (which has significant tax implications).

1.5.10. Am I charged how much space I use on some average basis?

No. This is a capacity recharge similar to purchasing an N Terabyte disk dedicated for your use.

If you are utilizing only 1/2 of the space, you are still charged for your purchased capacity.

1.5.11. What happens if I can’t pay for my space?

RCIC can work with you to move data off of CRSP in a timely manner:

You will be required to bring your utilized capacity to be within your baseline allocation.

If a researcher is not reducing utilized capacity, access to all data in this space will be frozen (no read or write access).

If, after multiple attempts, the owner of the space remains unresponsive, data will be deleted to bring it to baseline allocation.

1.5.12. Can researchers pool their allocations into one large space?

No. In extensive consultation with RCIC Executive committee, we established the people cost of tracking and implementing such combinations outweigh the benefits.

1.5.13. Any in/out network charges as with commercial cloud storage?

No. CRSP is connected at high-speed to the campus network and leverages this existing resource.

1.5.14. I can’t access CRSP from home, why?

All access modes of CRSP require you to be connected the UCI production network.

From home, you must use the campus VPN

1.5.15. Can I add the UCI license to WebDrive I got from their website?

You cannot. You must use RCIC-provided CRSP Desktop which is a specialized version of WebDrive for Windows and Mac that already have the license key embedded. Please see Windows CRSP Desktop App and macOS CRSP Desktop App for instructions how to download and use.

1.5.16. I want to publish some of my data on the web, can I do that?

Not yet. This is more complicated than it might appear. The key questions revolve around data security.

1.5.17. I have trouble accessing CRSP shares 

Consult our CRSP Troubleshooting.

1.6. Disk Quotas 

1.6.1. Why do I get file write error when saving files in my $HOME?

You exceeded your $HOME disk quota. See Quotas that explains how to check and fix.

1.6.2. I can’t save files in my CRSP area. How do i check my quotas?

See Quotas for explanation.

1.6.3. I get the Disk quota exceeded error on /dfsX/labY. Why?

You need to check your quotas and verify directories permissions. See Quotas for instructions on checking quotas and Data transfer for tips on data transfers.

1.6.4. My Slurm job failed with the Disk quota exceeded on /dfsX/labY 

This is group writable area, all users who write in this area contribute to the quota and the quota is sum total of all written files. Even if your job output small files, others may have filled it. You need to check your Quotas for the specific DFS filesystem.

1.7. Slurm jobs 

1.7.1. Can you give me an estimate of the expected wait times?

It is impossible to answer because how long a job waits depends on many job-specific parameters and the current cluster load:

If one asks for a generic core on the standard partition, the job is likely to schedule immediately.

Not all nodes in HPC3 have the same physical configuration and if a job specification is such that only a small number of nodes can match such job request, one might wait days or even weeks.

It takes longer to reserve entire nodes because one has to wait for all other jobs on the node to complete.

It is highly dependent on the current cluster load. During the low load periods the jobs will schedule quite quickly, and during the high load periods the scheduling will take a bit more time.

1.7.2. My job failed with out of memory (OOM) error. What can i do?

The actual message can vary depending on where and how you run your application and may contain OOM Killed, oom_kill events or oom-kill.

OOM signifies Out Of Memory errors. This means you requested a certain amount of memory but your job went over the limit and SLURM has terminated your job.

All partitions have specific configuration for memory, runtime, CPUs:

You need to increase the memory requirements for your job. See How to get more memory.

For the jobs that require more memory than the standard/free partitions can provide or for the jobs that require a lot of memory and not many CPUs, there is a limited number of higher memory nodes that are accessible via higher memory partitions. The Higher Memory guide explains how to request an access.

1.7.3. Why should I request an interactive job and how do I do this?

The interactive job are simply processes that run on compute nodes of the cluster.

Users need to use an interactive job when they plan to:

run some tasks that take longer than 20 min

run CPU or memory intensive tasks

run applications (including GUI)

do data transfers

do conda/mamba installs

See how to request an Interactive job.

1.7.4. How do I submit a job to the Slurm queue and see its status?

Submit an interactive job with srun command

Submit a batch job with sbatch command

See a status of a submitted job with squeue command

See Slurm jobs guide for examples.

1.7.5. What are array jobs and how do I submit them?

Array jobs are identical independent jobs that are run with different input parameters.

Instead of writing many submit scripts one can use a single script to submit many jobs.

This approach is much more efficient. See array jobs.

1.7.7. How do I charge my jobs to my account or my PI’s account?

Every user has a default account (which is UCnetID) and may have an access to PI lab accounts. If not specified, a default account is charged (exception is free queues).

See slurm guide for examples how to specify accounts for interactive and batch jobs.

1.7.8. How do I buy more Slurm time?

Only PI can purchase more hours, please see Purchase core-hours.

A basic allocation is explained in Resource Allocations.

1.7.9. How do I ask for more cores for my job?

You need to specify options --ntasks or --cpus-per-task in your job submission.

See Requesting Resources.

1.7.10. If I ask for X cores does my job run X times faster?

Asking for more cores does not make your program run faster unless your program is capable of using multiple cores. The performance of a given program does not always scale with more CPUs.

1.7.11. How do I know if I need more cores for my job?

There are 2 distinct situations:

You have a program that is multi-CPU aware. Often such programs have a parameter that specifies the number of CPUs it will use. If the program has no such switch, or you don’t set the switch, your program is likely using 1 CPU.

Your job failed with OOM - out of memory errors.

See Requesting Resources for explanation how to get more CPUs or more memory.

1.7.12. How do I know if I need more memory for a job?

Your job failed with out of memory errors (OOM).

You have a general knowledge of how much memory your program is using on an input of a certain size and you have increased the input.

To find out how much memory and CPU your job is using you need to use sacct, seff and sstat commands. See job monitoring for details.

1.7.13. How do I profile my job?

Slurm records statistics for every job, including how much memory and CPU was used, and the usage efficiency.

Slurm provides job efficiency monitoring capabilities that can give an idea about consumed memory, CPU and the efficiency. For most jobs job efficiency tools provide sufficient information to understand what resources are needed.

1.7.14. How do I see how many hours of allocation credit I have used?

You need to use sbank command. See Account balance.

1.7.15. How to see what jobs were run and their cost over some time?

We have a zotledger tool that provides this info. See Account balance.

1.7.16. I can’t submit jobs to GPU partition, what is wrong?

You are likely using your regular CPU account.

You need to have a separate GPU account to submit jobs to paid gpu partitions.

All users can submit jobs to free-gpu partition without special GPU account.

GPU accounts are not automatically given to everyone, your faculty adviser can request a GPU lab account and add you to the account access. For example, a PI panteater may have accounts:

for CPU jobs - PANTEATER_LAB

for GPU jobs - PANTEATER_LAB_GPU

1.7.17. How do I use partitions highmem/hugemem/maxmem?

The Higher Memory guide explains how to request an access.

1.7.18. Why is my job pending with an AssocGrpCPUMinutesLimit?

You don’t have enough hours in your account balance to run the job.

See Pending for an explanation and how to fix.

1.7.19. My job is killed after running for 48 hours, why?

You run your job with a default runtime, and Slurm killed the job once the run time limit was reached. All queues have specific default and max runtime limits. The default run time protects users from unintentionally using more CPU hours than intended.

If your job needs longer runtime, you need to request runtime.

1.7.20. My job needs longer time than 14 days, how do I request this?

First, you need to submit your job for the partition’s max runtime limits.

Second, request job time limit modification.