How to Use BeeGFS Parallel Storage

1. Introduction

Scalable storage on HPC3 in the /dfsX and /pub file paths are parallel file systems running BeeGFS on top of ZFS. Performance of each file system is quite good (5-6 GByte/s) when used properly. Currently there are six such systems with an aggregate throughput of more than 30 GByte/s. While each file system is capable, it is not too difficult for applications to exceed the inherent capabilities and completely wreck performance for everyone. It is beyond the scope of this document to describe parallel file systems in detail, but one should to start learning more to make better use of the file systems. Take-home concepts about parallel files systems are:


  • The largest performance bottleneck in any parallel file system is accessing the meta data server.

  • Parallel file systems perform well when reading/writing large files in good-sized (> 128KB) chunks.

  • They perform very poorly when reading/writing many small files.

  • All DFS systems are intended for scratch data. While storage is RAID protected, it is only single-copy with no snapshots. Deleted data is gone forever.

2. Account

There is NO separate account for BeeGFS filesystems. Each user is provided:

  • 1TB quota in /pub/$USER. Note, /pub/$USER is a short name for /dfs6/pub/$USER

  • 1TB backup quota for a selective backup

  • /dfsX/<lab-path> lab group quota (based on PI’s purchase allocation). The storage owner (PI) can specify what users have read/write capability on the specific filesystem.

  • See How quotas are enforced using groups

If you are submitting a ticket requesting to be added to a specific group for a specific filesystem access, please note we will need your PI confirmation in order to approve your request.

2.1. What to Store

  • any large input data files that are used for computational jobs

  • jobs transient input/output/error files

  • any frequently changing files

  • large user-authored software installations

2.2. Where to Store

Pick a location depending on the type of data (personal or group access)

  1. Private area - /pub/$USER

    Each user gets 1Tb allocation in /pub/$USER which is a unique private access area. Use this for data you don’t need to share with anyone.

  2. Group area - /dfsX/<lab-path>

    Most users have access to one or more group-shared areas. Within this area, all group members have read and write access. The organization is up to the group members with the following exception.

    This area is initially set with the group sticky bit set so that only allowed users can access this area. We advise users to NOT change permissions on the directories and files when writing in the group area. Incorrect permissions can lead to quota exceeded errors.

2.3. File permissions

File permissions are used in determining quotas.

All data in Unix is organized into files, all files are organized into directories and the directories are organized into a tree-like structure called the filesystem.

In Unix, there are three basic types of files:

ordinary file

is a file on the system can contains data, text, program instructions.

directory

directories store special and ordinary files. Unix directories are equivalent to folders on Windows or Mac OS.

special file

file that can provide access to hardware such as hard drives, symbolic links.

Every file in Unix has the following access modes:

read, denoted as r

the capability to read or view the contents of the file.

write, denoted as w

the capability to modify and remove the content of the file.

execute, denoted as x

the capability to run a file as a program.

sticky bit, denoted as s

additional capability to set permissions for Set User ID (SUID) and Set Group ID (SGID) bits.

Every file in Unix has the following attributes or permissions :

owner

determine what actions the owner of the file can perform on the file.

group

determine what actions a user, who is a member of the group that a file belongs to, can perform on the file.

other (world)

determine what action all other users can perform on the file.

File permissions can be displayed when using ls -l command:

[user@login-x:~]$ ls -l
total 55524423
drwxrwsr-x  7 panteater bio                 7 Aug  5  2019 biofiles
-rw-r--r--  1 panteater panteater  4294967296 May 31  2019 performance.tst

The first column in the output represents file type and its associated permissions. For example, drwxrwsr-x for biofiles means:

  • 1st character d is a file type, in this case a directory

  • next three characters rwx in positions (2-4) are the file’s owner permissions, and the owner has read (r), write (w) and execute (x) permission.

  • the second group of three characters rws in positions (5-7) is the permissions for the group to which the file belongs. Here, the group has read (r), write (w), execute (x) permission, and the sticky bit is set.

  • the third group of three characters r-x in positions (8-10) represents the permissions for everyone else, here (r) and execute (x).

3. Check DFS Quotas

Users are granted space with default allocations. Group (PIs) can purchase additional allocations.

  1. Every user’s default group is the same as their login. We call this the personal group. The only personal quota is on /dfs6/pub/<UCNetID>, the rest are group quotas.

  2. Users have 1 byte quotas on all DFS systems (except personal quota), it is the group quota that is used. If you create file with the incorrect group, you will likely see over quota errors.

  3. When writing in group area users need to remember that all members of the group contribute to the quota. It’s the sum total usage that counts. When quotas are exceeded, users can no longer write in the affected filesystem and will need to remove some files and directories to free space.

  4. Users can’t change quotas, but can submit a ticket asking to be added to the group quotas provided there is a confirmation from the PI about the change.

For DFS file systems one can use dfsquotas to check user/group quotas on a particular DFS pool.

To see the quotas for user panteater on DFS pool /dfs6:

[user@login-x:~]$ dfsquotas panteater dfs6
==== [Group Quotas on dfs6]

Quota information for storage pool Default (ID: 1):

      user/group     ||           size          ||    chunk files
     name     |  id  ||    used    |    hard    ||  used   |  hard
--------------|------||------------|------------||---------|---------
 panteater_lab|012345||   26.25 TiB|   50.00 TiB||  1310459| 18500000 (1)
   alpha_users|158537||      0 Byte|      1 Byte||        0|        1 (2)
     panteater|000865||  755.59 GiB| 1024.00 GiB||   258856|unlimited (3)

The above shows that a user panteater:

1 can write in the allocation for the group panteater_lab where the total space is 50Tb and ~26Tb of it is already used. Note, space used by the group include all users allowed to write in this area.
2 panteater belongs to a supplementary group alpha_users, this group has no allocation (1 byte) and the user will not be able to store any files that have this group ownership.
3 can write in personal /pub/panteater area, where the default allocation is 1Tb and ~756 Gb is already used by the user.

4. Over Quota

All Beegfs-based file systems have quota enforcement. When quota is filled, the users will not be able to write any files or directories and submitted jobs will fail with quota exceeded errors.

Quota is enforced by the file system based upon the Unix group membership of a particular file. For example,

[user@login-x:~]$ ls -l
total 55524423
drwxrwsr-x  7 panteater bio                 7 Aug  5  2019 biofiles
-rw-r--r--  1 panteater panteater  4294967296 May 31  2019 performance.tst
drwxrwsr-x  3 panteater panteater           2 Oct  8 17:11 myfiles

The user panteater is storing files under two different groups: bio and panteater. The file performance.tst is charged against the panteater group quota, while the files in the subdirectory biofiles should be charged to the bio group quota.

Examine the permissions of the directories: drwxrwsr-x. Notice the 's' for the group execute permissions. This is called the sticky bit for the directory. Compare to permissions without sticky: drwxrwxr-x. It’s subtle, but important ('x' instead of 's' in the group execute permission).

Sticky bit means:

With the sticky bit set

drwxrwsr-x

files written/directories created are written with the group membership of the directory. (the group is "sticky"). It also sets the sticky bit for any newly-created subdirectories.

With the sticky bit NOT set

drwxrwxr-x

files written into a directory a written with the active Unix group. Since this defaults to your login, that may not be what you want or expect. The Unix command newgrp can be used to change the active Unix group.

Under normal operation, when the sticky bit is set on a directory, the correct quota enforcement occurs automatically because files and subdirectories are written with correct group.

The most common quota problems on DFS result from inadvertently removing the sticky bit on a directory and then writing with the default (user’s personal group). In this case writes (and jobs) can fail.

Moving data to HPC3 with software that overrides the sticky bit by explicitly setting permissions in the most common way a sticky bit becomes unset.

4.1. Fixing Permissions

You can use the chmod command in Unix to fix directories that don’t have a sticky bit set, but should. The following command will add the sticky bit to a particular directory.

[user@login-x:~]$ chmod g+s <directory>

You can use the find command to change all directories (including the current one) in a subtree to have a sticky bit set:

[user@login-x:~]$ find . -type d -exec chmod g+s {} \; -print

4.2. Fixing Group Ownership

You can also use the chgrp command to change the group ownership of a file or directory. For example: this command would change the group from panteater to bio in the example listing above.

[user@login-x:~]$ chgrp bio performance.txt

5. Data transfer

If you need to bring some data from your laptop or another host to the cluster you will mainly need to use scp (there is an equivalent command for Windows) or rsync commands. You will need to give extra command-line parameters to ensure that the data transfer program you use will respect the sticky bit and not cause quota issues.

5.1. Using scp

Scp is a secure file transfer protocol. Scp allows one to connect to a remote server and transmit desired files via the connection. However, when files are transferred the destination sticky bits on directories are not inherited. This is not a problem if the users are copying files to /pub/$USER but is a problem when copying to /dfsX/<lab-path> area and usually results in quota exceeded errors.

There are 2 ways to deal with this.

  1. Scenario 1

    Scp needed files (using recursive directives if needed). For example, a user has an access to a group allocation /dfsX/panteater_lab/panteater and want to transfer data there.

    On your laptop or other server:

    scp -r mydata panteater@hpc3.rcic.uci.edu:/dfsX/panteater_lab/panteater

    On the cluster check the permissions on the transferred directory:

    [user@login-x:~]$ ls -l /dfsX/panteater_lab/panteater
    total 138
    drwxr-xr-x 6 panteater panteater_lab     18 Feb 18 13:10 mydata

    Note, the permissions drwxr-xr-x are missing s (sticky bit is not set) and this means all subdirectories under mydata are also missing it. Will need to fix the permissions on mydata:

    [user@login-x:~]$ chmod g+s /dfsX/panteater_lab/panteater/mydata

    and, similarly, on all subdirectories under it.

  2. Scenario 2 requires less work and is more accurate

    On your laptop (or remote server) create a compressed tar file of the files you want to transfer and then scp the complressed file:

    tar czvf mydata.tar.gz mydata
    scp -r mydata.tar.gz panteater@hpc3.rcic.uci.edu:/dfsX/panteater_lab/panteater

    On the cluster, uncompress transferred file and check permissions:

    [user@login-x:~]$ cd /dfsX/panteater_lab/panteater
    [user@login-x:~]$ tar xzf mydata.tar.gz
    [user@login-x:~]$ ls -l
    total 138
    drwxr-sr-x 6 panteater panteater_lab     18 Feb 18 13:12 mydata
    [user@login-x:~]$ ls -l mydata
    total 124
    -rw-r--r--  1 panteater panteater_lab 17075 Jul 21  2020 desc.cvs
    -rwxr-xr-x  1 panteater panteater_lab  7542 Jul 21  2020 README
    drwxr-sr-x  2 panteater panteater_lab     4 Feb 18 12:03 common
    drwxr-sr-x  2 panteater panteater_lab     3 Feb 18 12:03 images

    Note, the permissions drwxr-sr-x on mydata include s and all directories under mydata inherited it. Delete transferred mydata.tar.gz after verification.

5.2. Using rsync

Rsync is a program that allows to greatly speed up file transfers. See man rsync for more information and options to use.

There are two options in rsync command that will overwrite the destination permissions and is a common issue the users encounter:

  • -p, --perms preserve permissions

  • -a, --archive archive mode; same as -rlptgoD, implies -p

When -p option is used, rsync preserves the permissions of the source and this is not correct for the files and directories in destination that need to comply with user:group permissions.

Avoid using -p, -a options when running rsync commands.

For example, for a recursive copy use:

rsync -rv mydata panteater@hpc3.rcic.uci.edu:/dfsX/panteater_lab/panteater

6. Selective Backup

We cannot backup everything on the cluster. Selective Backup allows the users to choose what is important and have it automatically saved. The physical location of the backup server is different from the cluster location for extra protection.

You will want to backup only critical data such as scripts, programs, etc.
DO NOT backup data you can get from other sources, especially large data-sets.
If you go past your backup quota then backups stops for your account. The backup will fail as no new data can be written to the backup server since you reached your limit.

6.1. Default settings

The Selective Backup is based on rsync in conjunction with GNU Parallel. The combination maximizes the network throughput and server capabilities in order to backup hundreds of user accounts from multiple public and private filesystems.

The Selective Backup process will automatically start saving your home directory as well as some public and private disk spaces. There is nothing for you to do if you like the defaults.

Users manage their Selective Backup via two control files located in their $HOME directory:

  1. .hpc-selective-backup

    The .hpc-selective-backup file lists backup (1) options and the (2) files/directories names to be saved in order of priority from the most to the least important. All backup options are initially commented out. The files are backed in the order as they are listed. That way, if a user runs out of selective disk quota before all listed files have been backed up, at least their most prized data are saved. By default, this file contains $HOME and /pub areas of your account:

    /data/homezvol0/panteater
    /pub/panteater
  2. .hpc-selective-backup-exclude

    This file lists file/directories names you want to exclude from backup. By default, this file excludes ZFS snapshots from $HOME:

    $HOME/.zfs

For more information on rsync exclude patterns please see the "ANCHORING INCLUDE/EXCLUDE PATTERNS” section of man rsync.

The following table lists all available backup options for .hpc-selective-backup and what they do:

Selective Backup Option What It Does

HPC_SEND_EMAIL_SUMMARY

Sends you daily email summaries of your saves. Default is NO summary email notifications.

HPC_SEND_EMAIL_ON_ERROR

You will receive an email if rsync completes in error. Error being non-zero exit status from rsync. Consult the rsync man page for error values and meaning. If no errors are found with rsync, no email will be sent. Default is NO email notifications.

HPC_KEEP_DELETED=X

Keep deleted files on the backup server for X days where X is a number anywhere from 0 to 90 days. Deleted files are files you removed from the source location. Default is 14 days.

6.2. Custom settings

To customize, edit control files with your favorite editor.

We highly recommend that you

  1. request email notifications to make sure things are working

    Choose one of two SEND_EMAIL options in .hpc-selective-backup file and un-comment it (remove the # sign at the beginning of the line). For example, if you choose to receive email notifications in the event of errors, edit your configuration file and change the line:

    # HPC_SEND_EMAIL_ON_ERROR

    to this:

    HPC_SEND_EMAIL_ON_ERROR
  2. perform some spot checks of what you think is being saved to make sure your data is indeed being backed-up.

6.3. Where backups are

A user can access backup files on the login nodes of the cluster:

Path Description

/sbak/selective-backup/hpc-backups/$USER/data/homezvol*/$USER

$HOME on HPC3

/share/legacyhpc/users/$USER

old $HOME from HPC. Removing July 1, 2021

/sbak/selective-backup/hpc-backups/$USER/pub/$USER

/pub/$USER/

/sbak/selective-backup/hpc-backups/$USER/DELETED-FILES/$DATE

deleted files by date, count towards backup quota.

/sbak/selective-backup/hpc-logs/$DATE/$USER

backup logs are available for the past X days

6.4. Quotas for Selective Backup

To see the quota for selective backup:

[user@login-x:~]$ dfsquotas panteater sbak
/data/homezvol0/panteater
==== [Group Quotas on sbak]

Quota information for storage pool Default (ID: 1):

      user/group      ||           size          ||    chunk files
     name     |   id  ||    used    |    hard    ||  used   |  hard
--------------|-------||------------|------------||---------|---------
 panteater_lab| 158447||      0 Byte| 1024.00 GiB||        0|unlimited
   alpha_users| 158537||      0 Byte| 1024.00 GiB||        0|unlimited
     panteater|1847005||   30.82 GiB| 1024.00 GiB||   364668|unlimited

The above shows that a user panteater used ~32Gb of allocated 1Tb for all backups. Currently, all of the backup files are written by the user and group panteater (primary user group).

To see the quota for dfs6 and selective backup:

[user@login-x:~]$ dfsquotas panteater "dfs6 sbak"

7. Files recovery from snapshots

Only files and directories stored in $HOME or /dfs/pub/<user> are backed up.

Files and directories can be recovered provided they exist in the snapshots. Note: You have to be on a login node to access backup files.

Here is a general procedure for user panteater to restore accidentally deleted directory spring-2022 and files in it.

[panteater@login-i15] cd /sbak/selective-backup/hpc-backups/panteater/DELETED-FILES (1)
[panteater@login-i15] find . -type d -name spring-2022                              (2)
./2022-0621/pub/panteater/spring-2022
./2022-0629/pub/panteater/spring-2022
[panteater@login-i15] ls ./2022-0629/pub/panteater/spring-2022/                     (3)
schedule1  schedule1.sub slurm.template
[panteater@login-i15] cp -p -r ./2022-0629/pub/panteater/spring-2022 /pub/panteater (4)
1 This command puts you at the top level of a backup directory for your files.
2 This command finds all backups by date where the desired directory exists.
3 Run ls command for the specific snapshot to see if it has needed files.
4 At this point user can copy the files back to the pub directory. It is recommended to use -p and -r options. Option -p makes sure that copy command preserves the time stamp and the ownership of a file. Option -r says "copy recursively", this is needed when copying a directory and its contents.