3. DFS
3.1. Overview
Scalable storage on HPC3 in the /dfsX and /pub file paths are parallel file systems running BeeGFS on top of ZFS. This is a network-based multi-Petabyte storage cluster for the UCI campus research community. It is put in place so that researchers across UCI have a reliable and resilient location to store their research data and share with defined groups.
Performance of each file system is quite good (5-6 GByte/s) when used properly. Currently, there are multiple DFS systems with an aggregate throughput of more than 30 GByte/s.
Important
While each file system is capable, it is not too difficult for a single user to exceed the inherent capabilities and completely wreck performance for everyone.
It is beyond the scope of this document to describe parallel file systems in detail, but one should start learning more to make better use of the file systems.
- Take-home concepts about DFS parallel files systems are:
They perform well when reading/writing large files in good-sized (> 128KB) chunks.
They perform very poorly when reading/writing many small files.
All DFS systems are single copy, high-performance storage intended for scratch data. No special backup is available. Deleted data are gone forever.
Accessible only from HPC3 as a regular filesystem for storing data on the cluster. There are a few separate storage pools which are mounted on the cluster as /dfsX.
For recharge allocations please see Purchase DFS storage.
Warning
DFS filesystems must not be used to store personally-identifiable information that would fall under guidelines such as FERPA (e.g. Student data) and HIPAA (health-care data).
If you are unsure if DFS is suitable for your data, please refer to general guidance for data security provided by the UCI Office of Research
3.2. Allocations
There is NO separate account for DFS filesystems. Quotas are enforced using groups.
- No cost allocation - Private area:
All users have access to the Private Area. Each user is provided with a default allocation:
- 1TB quota per account in /pub/ucinetid- 1TB backup quota for a selective backup- Recharge allocation - Group shared area:
UCI Faculty members (PIs) can have low-cost recharge allocation(s) to fulfill their needs.
These group areas are quota allocations in /dfsX/group-lab-path based on PI’s purchase.
The PI is the storage owner.
The PI can specify additional users who can have read and write access to the area.
Please see Recharge Allocations for details how to purchase.
3.3. Storing Files
- What to Store
Any frequently changing files
Any large input data files that are used for computational jobs
Jobs transient input/output/error files
Large user-authored or third-party software installations
- Where to Store
Pick a location depending on the type of data (private or group access):
- /pub/ucinetid
is a unique PRIVATE access area
is NOT shared with other users
do NOT change this directory permissions
the organization of files and directories is up to the user
- /dfsX/<group-lab-path>
is a specific group shared area, users may have access to one or more group areas
all group members have read and write access
do NOT change directories permissions or sticky bit settings, see a warning below
the organization of files and directories is up to the group members
File permissions
File permissions are used in determining quotas.The permissions involve setting logical UNIX groups.Important
When we create Private areas and Group shared areas on DFS we set correct permissions on the top level directories.
Warning
Each Group shared area is initially configured with the group sticky bit set so that only allowed users can access this area.
We advise users to NOT change permissions on the directories and files when writing in the group area.
Incorrect permissions can lead to quota exceeded errors.
Please see UNIX primer to familiarize yourself with UNIX groups and make sure you understand UNIX File permissions.
3.4. Quotas
All DFS-based file systems have quota enforcement for all private and group shared areas.
When writing in Private area users need to remember that:
Every user has a default personal group which is the same as their login.
The 1TB personal group quota is on /pub/ucinetid.
Every user has a default 1Tb selective backup quota.
When writing in Group shared area users need to remember that:
All members of the group contribute to the quota. It’s the sum total usage that counts.
There are no individual user quotas in the Group shared area, only the group quota is used.
If you create file with the incorrect group, you will likely see over quota errors.
When quotas are exceeded, all users in the group will no longer be able to write in the affected filesystem and will need to remove some files and directories to free space.
Users can’t change quotas, but a PI can submit a ticket asking to update the quota. Please see Purchase DFS storage.
Users can submit a ticket asking to be added to the group shared area.
Note
If you are submitting a ticket requesting to be added to a specific group for a specific filesystem access, please note we will need your PI confirmation in order to approve your request. Use a cc to your PI when submitting a ticket. The PI must confirm the requested change via email reply.
3.4.1. How to check
For all DFS file systems including selective backup one can use dfsquotas
command to check user/group quotas on a particular DFS pool.
To see the quotas for user panteater on private allocation in /dfs6:
$ dfsquotas panteater dfs6 ==== Group quotas on dfs6 for user panteater ---------------------------------------------------------------------------- Group || Size || Chunk Files name | id || used | allocated || used | allocated ----------------------------------------------------------------------------- panteater_lab | 012345 || 26.25 GiB | 1024.00 GiB || 1310459 | unlimited # see 1 alpha_users | 158537 || 0 Byte | 1024.00 Gib || 0 | unlimited # see 2 panteater | 000865 || 755.59 GiB | 1024.00 GiB || 258856 | unlimited # see 3The above shows that a user panteater can write in its private area /pub/panteater using the above listed 3 groups:
panteater belongs to a supplementary group panteater_lab, and wrote 26.25Gb of data.
panteater belongs to a supplementary group alpha_users, and did not write any files using this group, but can if needed.
using a default panteater group user wrote ~756Gb of total allocation of 1Tb (1Tb = 1024Gb).
Note
Listed above groups are logical UNIX groups associated with the user account, and the primary use of such groups is to assign “group ownership” of files and directories. The 1Tb allocation is a total space that can be used by all listed user UNIX groups combined, not by each group individually.
To see the quotas for user panteater in lab shared allocation in /dfs9:
$ dfsquotas panteater dfs9 ==== Group quotas on dfs9 for user panteater ---------------------------------------------------------------------------- Group || Size || Chunk Files name | id || used | allocated || used | allocated ----------------------------------------------------------------------------- panteater_lab | 012345 || 38.36 TiB| 40.00 TiB || 1310459 | unlimited # see 4 alpha_users | 158537 || 0 byte| 1 byte || 0 | 1 # see 5 panteater | 000865 || 0 byte| 1 byte || 0 | 1 # see 5
The above shows that user panteater can write in its group allocation on dfs9 only if using UNIX group panteater_lab for which there is 40Tb allocation. Note, the allocated space 40Tb and the used space 38.36Tb are totals by all users allowed to write in this area.
There is 0 quota (shown as 1 byte) for a default personal group panteater or a supplemental UNIX group alpha_users. If a user tries to write using these UNIX groups it will result in permissions and over the quota errors.
To see the quotas on all DFS filesystemss:
$ dfsquotas panteater all ==== Group quotas on dfs3b for user panteater No quotas to report ==== Group quotas on dfs4 for user panteater No quotas to report ==== Group quotas on dfs5 for user panteater No quotas to report ==== Group quotas on dfs6 for user panteater ---------------------------------------------------------------------------- Group || Size || Chunk Files name | id || used | allocated || used | allocated ----------------------------------------------------------------------------- panteater_lab | 012345 || 26.25 GiB | 1024.00 GiB || 1310459 | unlimited alpha_users | 158537 || 0 Byte | 1024.00 Gib || 0 | unlimited panteater | 000865 || 755.59 GiB | 1024.00 GiB || 258856 | unlimited ==== Group quotas on dfs7 for user panteater No quotas to report ==== Group quotas on dfs8 for user panteater No quotas to report ==== Group quotas on dfs9 for user panteater ---------------------------------------------------------------------------- Group || Size || Chunk Files name | id || used | allocated || used | allocated ----------------------------------------------------------------------------- panteater_lab | 012345 || 38.36 TiB| 40.00 TiB || 1310459 | unlimited alpha_users | 158537 || 0 byte| 1 byte || 0 | 1 panteater | 000865 || 0 byte| 1 byte || 0 | 1When you see No quotas to report it means there are no quotas for the user on this specific DFS filesystem.
3.4.2. Over quotas
When quota is filled, the users will not be able to write any files or directories and submitted jobs will fail with quota exceeded errors.
Quota is enforced by the file system based upon the Unix group membership of a particular file. For example:
$ ls -l
total 55524423
drwxrwsr-x 7 panteater bio 7 Aug 5 2019 biofiles
-rw-r--r-- 1 panteater panteater 4294967296 May 31 2019 performance.tst
drwxrwsr-x 3 panteater panteater 2 Oct 8 17:11 myfiles
The user panteater is storing files under two different groups:
the files in the subdirectory biofiles are charged to the bio group quota.
the file performance.tst and subdirectory myfiles are charged to the panteater group quota
Examine the permissions of the directories: drwxrwsr-x. Notice the s for the group execute permissions (character positions 5-7). This is called the sticky bit for the directory. It is subtle, but important difference: x instead of s in the group execute permission. Compare to permissions without sticky bit:
Sticky bit |
Directory mode |
Description |
---|---|---|
is set |
drwxrwsr-x |
In the origin directory, created files and directories are written with the group permissions rws of the origin directory. The sticky bit s is set. |
is NOT set |
drwxrwxr-x |
In the origin directory, created files and directories are written with the active UNIX group permissions rwx of the origin directory, which defaults to your login. |
The Unix command newgrp
can be used to change the active Unix group.
For example, the user panteater by default has a group panteater.
The following sequence of simple commands shows the ownership of the files
created under different groups and shows how to use newgrp
command.
$ id panteater
uid=1234567(panteater) gid=1234567(panteater) groups=1234567(panteater),158571(bio)
$ touch aaa
$ ls -l aaa
-rw-rw-r-- 1 panteater panteater 0 Nov 3 14:57 aaa
$ newgrp bio
$ touch bbb
$ ls -l bbb
-rw-rw-r-- 1 panteater bio 0 Nov 3 14:57 bbb
Please type man newgrp
to learn about this command.
- Reasons for Over Quota
Under normal operation, when the sticky bit is set on a directory, the correct quota enforcement occurs automatically because files and subdirectories are written with correct group, no
newgrp
command is needed. When all space is used over quota is issued.The most common quota problems on DFS result from:
inadvertently removing the sticky bit on a directory and then writing with the default personal group.
changing the group ownership of a file or directory and then trying to write to it with the default personal group.
In these cases writing files and running jobs can fail.
Moving data to HPC3 with software that overrides the sticky bit by explicitly setting permissions in the most common way a sticky bit becomes unset.
Note
Please see Data transfer for information how to move data to the cluster.
3.4.3. Fix over quotas
- Fixing Permissions
You can use the
chmod
command to fix directories that don’t have a sticky bit set, but should have. The following command will add the sticky bit to a particular directory.$ chmod g+s directory-name
You can use the
find
command to find all directories in a subtree and combine it withchmod
command to set the sticky bit on all found directories:$ find . -type d -exec chmod g+s {} \; -print
- Fixing Group Ownership
You can also use the
chgrp
andchown
commands to change the group ownership of a file or directory. For example, to change the group from panteater to bio on a specific file or directory:$ ls -l total 55524423 drwxrwsr-x 7 panteater bio 7 Aug 5 2019 biofiles -rw-r--r-- 1 panteater panteater 4294967296 May 31 2019 performance.tst drwxrwsr-x 3 panteater panteater 2 Oct 8 17:11 myfiles $ chgrp bio performance.txt $ chown -R panteater:bio myfiles $ ls -l total 55524423 drwxrwsr-x 7 panteater bio 7 Aug 5 2019 biofiles -rw-r--r-- 1 panteater bio 4294967296 May 31 2019 performance.tst drwxrwsr-x 3 panteater bio 2 Oct 8 17:11 myfiles
The ls -l command is used to show permissions before and after the change.
3.5. Selective Backup
We cannot backup everything on the cluster. Selective Backup allows the users to choose what is important and have it automatically saved. The physical location of the backup server is different from the cluster location for extra protection.
Note
You will want to backup only critical data such as scripts, programs, etc.
Warning
DO NOT backup data you can get from other sources, especially large data-sets.
Important
If you go past your backup quota then backups stops for your account. The backup will fail as no new data can be written to the backup server since you reached your limit.
3.5.1. Default settings
The Selective Backup is based on rsync
in conjunction with GNU Parallel. The combination
maximizes the network throughput and server capabilities in order to backup hundreds of
user accounts from multiple public and private filesystems.
The Selective Backup process will automatically start saving your home directory as well as some public and private disk spaces.
Note
Users manage their Selective Backup via two control files located in their $HOME directory:
.hpc-selective-backup This file lists (1) backup options and the (2) files/directories names to be saved in order of priority from the most to the least important. All backup options are initially commented out.
The files are backed in the order as they are listed. That way, if a user runs out of selective disk quota before all listed files have been backed up, at least their most prized data are saved. By default, this file contains $HOME and /pub areas of your account:
/data/homezvolX/ucinetid /pub/ucinetid
The following table lists all available backup options:
Selective Bakup Option
What it does
HPC_SEND_EMAIL_SUMMARY
Sends you daily email summaries of your saves. Default is NO summary email notifications.
HPC_SEND_EMAIL_ON_ERROR
You will receive an email only if rsync completes with an error. Error being non-zero exit status from rsync. Consult the
man rsync
page for error values and meaning. Default is NO email notifications.HPC_KEEP_DELETED=X
Keep deleted files on the backup server for X days where X is a number in 0-90 range. Deleted files are files you removed from the source location. Default is 14 days.
.hpc-selective-backup-exclude This file lists file/directories names you want to exclude from backup. By default, this file excludes ZFS snapshots from $HOME:
$HOME/.zfs
For more information on
rsync
exclude patterns please see the “ANCHORING INCLUDE/EXCLUDE PATTERNS” section ofman rsync
command output.
3.5.2. Custom settings
To customize, edit control files with your favorite editor. We highly recommend the following:
request email notifications to make sure things are working
Choose one of two SEND_EMAIL options in .hpc-selective-backup file and uncomment it (remove the # sign at the beginning of the line). For example, if you choose to receive email notifications in the event of errors, edit your configuration file and change the line:
# HPC_SEND_EMAIL_ON_ERROR
to:
HPC_SEND_EMAIL_ON_ERROR
perform some spot checks of what you think is being saved to make sure your data is indeed being backed-up.
3.5.3. Where backups are
A user can access backup files on the login nodes of the cluster from the following paths:
Where |
What |
---|---|
/sbak/zvolX/backups/ucinetid/data/homezvolX/ucinetid |
user $HOME |
/sbak/zvolX/backups/ucinetid/pub/ucinetid |
/pub/$USER/ |
/sbak/zvolX/backups/ucinetid/DELETED-FILES |
deleted files by date (counts towards backup quota) |
/sbak/zvolX/logs/$DATE/ucinetid |
backup logs by date, available for the past Y days |
Note
3.6. Deleted Files Recovery
Note
Below is a general procedure for user panteater to restore accidentally deleted from /pub/panteater directory spring-2022 and files in it.
$ cd /sbak/zvol0/backups/panteater/DELETED-FILES # see 1
$ find . -type d -name spring-2022 # see 2
./2024-0214/pub/panteater/spring-2022
./2024-0213/pub/panteater/spring-2022
$ ls ./2024-0214/pub/panteater/spring-2022/ # see 3
schedule1 schedule1.sub slurm.template
$ cp -p -r ./2024-0214/pub/panteater/spring-2022 /pub/panteater # see 4
The above commands mean:
The
cd
command puts you at the top level of a backup directory for your files.The
find
command finds all backups by date where the desired directory exists. Here, two snapshots are found by date: 2024-0214 and 2024-0213.Run
ls
command for the specific snapshot to see if it has needed files.If needed files exists in the backup, user can use
cp
command to copy the files back to the pub directory. It is recommended to use-p
and-r
options. Option-p
makes sure that copy command preserves the time stamp and the ownership of a file. Option-r
means “copy recursively”, this is needed when copying a directory and its contents.
One can restore in a similar way files and directories deleted from $HOME.