
How to Use BeeGFS Parallel Storage
1. Introduction
Scalable storage on HPC3 in the /dfsX and /pub file paths are parallel file systems running BeeGFS on top of ZFS. Performance of each file system is quite good (5-6 GByte/s) when used properly. Currently there are six such systems with an aggregate throughput of more than 30 GByte/s. While each file system is capable, it is not too difficult for applications to exceed the inherent capabilities and completely wreck performance for everyone. It is beyond the scope of this document to describe parallel file systems in detail, but one should to start learning more to make better use of the file systems. Take-home concepts about parallel files systems are:
-
The largest performance bottleneck in any parallel file system is accessing the meta data server.
-
Parallel file systems perform well when reading/writing large files in good-sized (> 128KB) chunks.
-
They perform very poorly when reading/writing many small files.
-
All DFS systems are intended for scratch data. While storage is RAID protected, it is only single-copy with no snapshots. Deleted data is gone forever.
2. Account
There is NO separate account for BeeGFS filesystems. Each user is provided:
-
1TB quota in /pub/$USER. Note, /pub/$USER is a short name for /dfs6/pub/$USER
-
1TB backup quota for a selective backup
-
/dfsX/<lab-path> lab group quota (based on PI’s purchase allocation). The storage owner (PI) can specify what users have read/write capability on the specific filesystem.
If you are submitting a ticket requesting to be added to a specific group for a specific filesystem access, please note we will need your PI confirmation in order to approve your request. |
2.1. What to Store
-
any large input data files that are used for computational jobs
-
jobs transient input/output/error files
-
any frequently changing files
-
large user-authored software installations
2.2. Where to Store
Pick a location depending on the type of data (personal or group access)
-
Private area - /pub/$USER
Each user gets 1Tb allocation in /pub/$USER which is a unique private access area. Use this for data you don’t need to share with anyone.
-
Group area - /dfsX/<lab-path>
Most users have access to one or more group-shared areas. Within this area, all group members have read and write access. The organization is up to the group members with the following exception.
This area is initially set with the group sticky bit set so that only allowed users can access this area. We advise users to NOT change permissions on the directories and files when writing in the group area. Incorrect permissions can lead to quota exceeded errors.
2.3. File permissions
File permissions are used in determining quotas.
All data in Unix is organized into files, all files are organized into directories and the directories are organized into a tree-like structure called the filesystem.
In Unix, there are three basic types of files:
ordinary file |
is a file on the system can contains data, text, program instructions. |
directory |
directories store special and ordinary files. Unix directories are equivalent to folders on Windows or Mac OS. |
special file |
file that can provide access to hardware such as hard drives, symbolic links. |
Every file in Unix has the following access modes:
read, denoted as r |
the capability to read or view the contents of the file. |
write, denoted as w |
the capability to modify and remove the content of the file. |
execute, denoted as x |
the capability to run a file as a program. |
sticky bit, denoted as s |
additional capability to set permissions for Set User ID (SUID) and Set Group ID (SGID) bits. |
Every file in Unix has the following attributes or permissions :
owner |
determine what actions the owner of the file can perform on the file. |
group |
determine what actions a user, who is a member of the group that a file belongs to, can perform on the file. |
other (world) |
determine what action all other users can perform on the file. |
File permissions can be displayed when using ls -l command:
[user@login-x:~]$ ls -l total 55524423 drwxrwsr-x 7 panteater bio 7 Aug 5 2019 biofiles -rw-r--r-- 1 panteater panteater 4294967296 May 31 2019 performance.tst
The first column in the output represents file type and its associated permissions. For example, drwxrwsr-x for biofiles means:
-
1st character d is a file type, in this case a directory
-
next three characters rwx in positions (2-4) are the file’s owner permissions, and the owner has read (r), write (w) and execute (x) permission.
-
the second group of three characters rws in positions (5-7) is the permissions for the group to which the file belongs. Here, the group has read (r), write (w), execute (x) permission, and the sticky bit is set.
-
the third group of three characters r-x in positions (8-10) represents the permissions for everyone else, here (r) and execute (x).
3. Check DFS Quotas
Users are granted space with default allocations. Group (PIs) can purchase additional allocations.
-
Every user’s default group is the same as their login. We call this the personal group. The only personal quota is on /dfs6/pub/<UCNetID>, the rest are group quotas.
-
Users have 1 byte quotas on all DFS systems (except personal quota), it is the group quota that is used. If you create file with the incorrect group, you will likely see over quota errors.
-
When writing in group area users need to remember that all members of the group contribute to the quota. It’s the sum total usage that counts. When quotas are exceeded, users can no longer write in the affected filesystem and will need to remove some files and directories to free space.
-
Users can’t change quotas, but can submit a ticket asking to be added to the group quotas provided there is a confirmation from the PI about the change.
For DFS file systems one can use dfsquotas
to check user/group
quotas on a particular DFS pool.
To see the quotas for user panteater on DFS pool /dfs6:
[user@login-x:~]$ dfsquotas panteater dfs6 ==== [Group Quotas on dfs6] Quota information for storage pool Default (ID: 1): user/group || size || chunk files name | id || used | hard || used | hard --------------|------||------------|------------||---------|--------- panteater_lab|012345|| 26.25 TiB| 50.00 TiB|| 1310459| 18500000 (1) alpha_users|158537|| 0 Byte| 1 Byte|| 0| 1 (2) panteater|000865|| 755.59 GiB| 1024.00 GiB|| 258856|unlimited (3)
The above shows that a user panteater:
1 | can write in the allocation for the group panteater_lab where the total space is 50Tb and ~26Tb of it is already used. Note, space used by the group include all users allowed to write in this area. |
2 | panteater belongs to a supplementary group alpha_users, this group has no allocation (1 byte) and the user will not be able to store any files that have this group ownership. |
3 | can write in personal /pub/panteater area, where the default allocation is 1Tb and ~756 Gb is already used by the user. |
4. Over Quota
All Beegfs-based file systems have quota enforcement. When quota is filled, the users will not be able to write any files or directories and submitted jobs will fail with quota exceeded errors.
Quota is enforced by the file system based upon the Unix group membership of a particular file. For example,
[user@login-x:~]$ ls -l total 55524423 drwxrwsr-x 7 panteater bio 7 Aug 5 2019 biofiles -rw-r--r-- 1 panteater panteater 4294967296 May 31 2019 performance.tst drwxrwsr-x 3 panteater panteater 2 Oct 8 17:11 myfiles
The user panteater is storing files under two different groups: bio and panteater. The file performance.tst is charged against the panteater group quota, while the files in the subdirectory biofiles should be charged to the bio group quota.
Examine the permissions of the directories: drwxrwsr-x. Notice the 's' for the group execute permissions. This is called the sticky bit for the directory. Compare to permissions without sticky: drwxrwxr-x. It’s subtle, but important ('x' instead of 's' in the group execute permission).
Sticky bit means:
With the sticky bit set |
drwxrwsr-x |
files written/directories created are written with the group membership of the directory. (the group is "sticky"). It also sets the sticky bit for any newly-created subdirectories. |
With the sticky bit NOT set |
drwxrwxr-x |
files written into a directory a written with the active Unix group.
Since this defaults to your login, that may not be what you want or expect. The Unix command |
Under normal operation, when the sticky bit is set on a directory, the correct quota enforcement occurs automatically because files and subdirectories are written with correct group.
The most common quota problems on DFS result from inadvertently removing the sticky bit on a directory and then writing with the default (user’s personal group). In this case writes (and jobs) can fail. |
Moving data to HPC3 with software that overrides the sticky bit by explicitly setting permissions in the most common way a sticky bit becomes unset.
4.1. Fixing Permissions
You can use the chmod
command in Unix to fix directories that don’t have a sticky bit set,
but should. The following command will add the sticky bit to a particular directory.
[user@login-x:~]$ chmod g+s <directory>
You can use the find
command to
change all directories (including the current one) in a subtree to have a
sticky bit set:
[user@login-x:~]$ find . -type d -exec chmod g+s {} \; -print
4.2. Fixing Group Ownership
You can also use the chgrp
command to change the group ownership of a file or directory. For example:
this command would change the group from panteater to bio in the example listing above.
[user@login-x:~]$ chgrp bio performance.txt
5. Data transfer
If you need to bring some data from your laptop or another host to the
cluster you will mainly need to use scp
(there is an equivalent command for
Windows) or rsync
commands. You will need to give extra command-line parameters to ensure that
the data transfer program you use will respect the sticky bit and not cause quota issues.
5.1. Using scp
Scp is a secure file transfer protocol. Scp allows one to connect to a remote server and transmit desired files via the connection. However, when files are transferred the destination sticky bits on directories are not inherited. This is not a problem if the users are copying files to /pub/$USER but is a problem when copying to /dfsX/<lab-path> area and usually results in quota exceeded errors.
There are 2 ways to deal with this.
-
Scenario 1
Scp needed files (using recursive directives if needed). For example, a user has an access to a group allocation /dfsX/panteater_lab/panteater and want to transfer data there.
On your laptop or other server:
scp -r mydata panteater@hpc3.rcic.uci.edu:/dfsX/panteater_lab/panteater
On the cluster check the permissions on the transferred directory:
[user@login-x:~]$ ls -l /dfsX/panteater_lab/panteater total 138 drwxr-xr-x 6 panteater panteater_lab 18 Feb 18 13:10 mydata
Note, the permissions drwxr-xr-x are missing s (sticky bit is not set) and this means all subdirectories under mydata are also missing it. Will need to fix the permissions on mydata:
[user@login-x:~]$ chmod g+s /dfsX/panteater_lab/panteater/mydata
and, similarly, on all subdirectories under it.
-
Scenario 2 requires less work and is more accurate
On your laptop (or remote server) create a compressed tar file of the files you want to transfer and then scp the complressed file:
tar czvf mydata.tar.gz mydata scp -r mydata.tar.gz panteater@hpc3.rcic.uci.edu:/dfsX/panteater_lab/panteater
On the cluster, uncompress transferred file and check permissions:
[user@login-x:~]$ cd /dfsX/panteater_lab/panteater [user@login-x:~]$ tar xzf mydata.tar.gz [user@login-x:~]$ ls -l total 138 drwxr-sr-x 6 panteater panteater_lab 18 Feb 18 13:12 mydata [user@login-x:~]$ ls -l mydata total 124 -rw-r--r-- 1 panteater panteater_lab 17075 Jul 21 2020 desc.cvs -rwxr-xr-x 1 panteater panteater_lab 7542 Jul 21 2020 README drwxr-sr-x 2 panteater panteater_lab 4 Feb 18 12:03 common drwxr-sr-x 2 panteater panteater_lab 3 Feb 18 12:03 images
Note, the permissions drwxr-sr-x on mydata include s and all directories under mydata inherited it. Delete transferred mydata.tar.gz after verification.
5.2. Using rsync
Rsync
is a program that allows to greatly speed up file transfers.
See man rsync
for more information and options to use.
There are two options in rsync command that will overwrite the destination permissions and is a common issue the users encounter:
-
-p, --perms preserve permissions
-
-a, --archive archive mode; same as -rlptgoD, implies -p
When -p option is used, rsync preserves the permissions of the source and this is not correct for the files and directories in destination that need to comply with user:group permissions.
Avoid using -p, -a options when running rsync commands. |
For example, for a recursive copy use:
rsync -rv mydata panteater@hpc3.rcic.uci.edu:/dfsX/panteater_lab/panteater
6. Selective Backup
We cannot backup everything on the cluster. Selective Backup allows the users to choose what is important and have it automatically saved. The physical location of the backup server is different from the cluster location for extra protection.
You will want to backup only critical data such as scripts, programs, etc. |
DO NOT backup data you can get from other sources, especially large data-sets. |
If you go past your backup quota then backups stops for your account. The backup will fail as no new data can be written to the backup server since you reached your limit. |
6.1. Default settings
The Selective Backup is based on rsync in conjunction with GNU Parallel. The combination maximizes the network throughput and server capabilities in order to backup hundreds of user accounts from multiple public and private filesystems.
The Selective Backup process will automatically start saving your home directory as well as some public and private disk spaces. There is nothing for you to do if you like the defaults.
Users manage their Selective Backup via two control files located in their $HOME directory:
-
.hpc-selective-backup
The .hpc-selective-backup file lists backup (1) options and the (2) files/directories names to be saved in order of priority from the most to the least important. All backup options are initially commented out. The files are backed in the order as they are listed. That way, if a user runs out of selective disk quota before all listed files have been backed up, at least their most prized data are saved. By default, this file contains $HOME and /pub areas of your account:
/data/homezvol0/panteater /pub/panteater
-
.hpc-selective-backup-exclude
This file lists file/directories names you want to exclude from backup. By default, this file excludes ZFS snapshots from $HOME:
$HOME/.zfs
For more information on rsync
exclude patterns please see the "ANCHORING
INCLUDE/EXCLUDE PATTERNS” section of man rsync
.
The following table lists all available backup options for .hpc-selective-backup and what they do:
Selective Backup Option | What It Does |
---|---|
HPC_SEND_EMAIL_SUMMARY |
Sends you daily email summaries of your saves. Default is NO summary email notifications. |
HPC_SEND_EMAIL_ON_ERROR |
You will receive an email if rsync completes in error. Error being non-zero exit status from rsync. Consult the rsync man page for error values and meaning. If no errors are found with rsync, no email will be sent. Default is NO email notifications. |
HPC_KEEP_DELETED=X |
Keep deleted files on the backup server for X days where X is a number anywhere from 0 to 90 days. Deleted files are files you removed from the source location. Default is 14 days. |
6.2. Custom settings
To customize, edit control files with your favorite editor.
We highly recommend that you
-
request email notifications to make sure things are working
Choose one of two SEND_EMAIL options in .hpc-selective-backup file and un-comment it (remove the # sign at the beginning of the line). For example, if you choose to receive email notifications in the event of errors, edit your configuration file and change the line:
# HPC_SEND_EMAIL_ON_ERROR
to this:
HPC_SEND_EMAIL_ON_ERROR
-
perform some spot checks of what you think is being saved to make sure your data is indeed being backed-up.
6.3. Where backups are
A user can access backup files on the login nodes of the cluster:
Path | Description |
---|---|
/sbak/selective-backup/hpc-backups/$USER/data/homezvol*/$USER |
$HOME on HPC3 |
/share/legacyhpc/users/$USER |
old $HOME from HPC. Removing July 1, 2021 |
/sbak/selective-backup/hpc-backups/$USER/pub/$USER |
/pub/$USER/ |
/sbak/selective-backup/hpc-backups/$USER/DELETED-FILES/$DATE |
deleted files by date, count towards backup quota. |
/sbak/selective-backup/hpc-logs/$DATE/$USER |
backup logs are available for the past X days |
6.4. Quotas for Selective Backup
To see the quota for selective backup:
[user@login-x:~]$ dfsquotas panteater sbak /data/homezvol0/panteater ==== [Group Quotas on sbak] Quota information for storage pool Default (ID: 1): user/group || size || chunk files name | id || used | hard || used | hard --------------|-------||------------|------------||---------|--------- panteater_lab| 158447|| 0 Byte| 1024.00 GiB|| 0|unlimited alpha_users| 158537|| 0 Byte| 1024.00 GiB|| 0|unlimited panteater|1847005|| 30.82 GiB| 1024.00 GiB|| 364668|unlimited
The above shows that a user panteater used ~32Gb of allocated 1Tb for all backups. Currently, all of the backup files are written by the user and group panteater (primary user group).
To see the quota for dfs6 and selective backup:
[user@login-x:~]$ dfsquotas panteater "dfs6 sbak"
7. Files recovery from snapshots
Only files and directories stored in $HOME or /dfs/pub/<user> are backed up.
Files and directories can be recovered provided they exist in the snapshots. Note: You have to be on a login node to access backup files.
Here is a general procedure for user panteater to restore accidentally deleted directory spring-2022 and files in it.
[panteater@login-i15] cd /sbak/selective-backup/hpc-backups/panteater/DELETED-FILES (1) [panteater@login-i15] find . -type d -name spring-2022 (2) ./2022-0621/pub/panteater/spring-2022 ./2022-0629/pub/panteater/spring-2022 [panteater@login-i15] ls ./2022-0629/pub/panteater/spring-2022/ (3) schedule1 schedule1.sub slurm.template [panteater@login-i15] cp -p -r ./2022-0629/pub/panteater/spring-2022 /pub/panteater (4)
1 | This command puts you at the top level of a backup directory for your files. |
2 | This command finds all backups by date where the desired directory exists. |
3 | Run ls command for the specific snapshot to see if it has needed files. |
4 | At this point user can copy the files back to the pub directory. It is recommended to use -p and -r options. Option -p makes sure that copy command preserves the time stamp and the ownership of a file. Option -r says "copy recursively", this is needed when copying a directory and its contents. |