DFS

Overview

Scalable storage on HPC3 in the /dfsX and /pub file paths are parallel file systems running BeeGFS on top of ZFS. This is a network-based multi-Petabyte storage cluster for the UCI campus research community. It is put in place so that researchers across UCI have a reliable and resilient location to store their research data and share with defined groups.

Performance of each file system is quite good (5-6 GByte/s) when used properly. Currently, there are multiple DFS systems with an aggregate throughput of more than 30 GByte/s.

Important

While each file system is capable, it is not too difficult for a single user to exceed the inherent capabilities and completely wreck performance for everyone.

It is beyond the scope of this document to describe parallel file systems in detail, but one should start learning more to make better use of the file systems.

Take-home concepts about DFS parallel files systems are:
  • They perform well when reading/writing large files in good-sized (> 128KB) chunks.

  • They perform very poorly when reading/writing many small files.

  • All DFS systems are single copy, high-performance storage intended for scratch data. No special backup is available. Deleted data are gone forever.

  • Accessible only from HPC3 as a regular filesystem for storing data on the cluster. There are a few separate storage pools which are mounted on the cluster as /dfsX.

  • For recharge allocations please see Purchase DFS storage.

  • Warning

    DFS filesystems must not be used to store personally-identifiable information that would fall under guidelines such as FERPA (e.g. Student data) and HIPAA (health-care data).

    If you are unsure if DFS is suitable for your data, please refer to general guidance for data security provided by the UCI Office of Research

Allocations

There is NO separate account for DFS filesystems. Quotas are enforced using groups.

No cost Private area:
All users have access to the Private Area. Each user is provided with a default allocation:
- 1TB quota per account in /pub/ucinetid
- 1TB backup quota for a selective backup
Recharge allocation - Group Shared area:

UCI Faculty members can have low-cost recharge allocation(s) to fulfill their needs. These group areas are quota allocations in /dfsX/group-lab-path based on PI’s purchase. The storage owner (PI) can specify additional users who have read/write capability on the filesystem. Please see Recharge Allocations for details how to request.

Note

If you are submitting a ticket requesting to be added to a specific group for a specific filesystem access, please note we will need your PI confirmation in order to approve your request. Use a cc to your PI when submitting a ticket.

Storing Files

What to Store
  • Any frequently changing files

  • Any large input data files that are used for computational jobs

  • Jobs transient input/output/error files

  • Large user-authored software installations

Where to Store

Pick a location depending on the type of data (personal or group access)

  1. The /pub/ucinetid is a unique PRIVATE access area and is NOT shared with other users.

  2. Most users have access to one or more group-shared areas in /dfsX/<group-lab-path>. Within this area, all group members have read and write access. The organization is up to the group members with one exception: do not change sticky bit settings.

File permissions

Important

File permissions are used in determining quotas. When we create Private area and Group shared area on DFS we set correct permissions on the top level directories. The permissions involve setting logical UNIX groups. Please see UNIX primer to familiarize yourself with UNIX groups.

Warning

Each group lab area is initially configured with the group sticky bit set so tht only allowed users can access this area. We advise users to NOT change permissions on the directories and files when writing in the group area. Incorrect permissions can lead to quota exceeded errors.

Please make sure you understand UNIX File permissions.

Quotas

All DFS-based file systems have quota enforcement.

  • Every user has a default personal group which is the same as their login. The only 1TB personal quota is on /pub/ucinetid, the rest are group quotas.

  • Every user has a default 1Tb selective backup quota.

  • Users have 1 byte quota on all DFS systems (except personal quota), it is the group quota that is used. If you create file with the incorrect group, you will likely see over quota errors.

  • When writing in group area users need to remember that all members of the group contribute to the quota. It’s the sum total usage that counts. When quotas are exceeded, users can no longer write in the affected filesystem and will need to remove some files and directories to free space.

  • Users can’t change quotas, but can submit a ticket asking to be added to the group quotas provided there is a confirmation from the PI about the change.

How to check

For all DFS file systems including selective backup one can use dfsquotas command to check user/group quotas on a particular DFS pool.

To see the quotas for user panteater on private allocation in /dfs6:

$ dfsquotas panteater dfs6

==== [Group Quotas on dfs6]

Quota information for storage pool Default (ID: 1):

      user/group     ||           size          ||    chunk files
     name     |  id  ||    used    |    hard    ||  used   |  hard
--------------|------||------------|------------||---------|---------
 panteater_lab|012345||   26.25 GiB| 1024.00 GiB||  1310459|unlimited  # see 1
   alpha_users|158537||      0 Byte| 1024.00 Gib||        0|unlimited  # see 2
     panteater|000865||  755.59 GiB| 1024.00 GiB||   258856|unlimited  # see 3

The above shows that a user panteater can write in its personal area /pub/panteater using the above listed 3 groups:

  1. panteater belongs to a supplementary group panteater_lab, and wrote 26.25Gb of data.

  2. panteater belongs to a supplementary group alpha_users, and did not write any files using this group, but can if needed.

  3. using a default panteater group user wrote ~756Gb of total allocation of 1Tb (1Tb = 1024Gb).

Note

Listed above groups are logical UNIX groups associated with the user account, and the primary use of such groups is to assign “group ownership” of files and directories. The 1Tb allocation is a total space that can be used by all listed user UNIX groups combined, not by each group individually.

To see the quotas for user panteater in lab shared allocation in /dfs9:

$ dfsquotas panteater dfs9

==== [Group Quotas on dfs6]

Quota information for storage pool Default (ID: 1):

      user/group     ||           size          ||    chunk files
     name     |  id  ||    used    |    hard    ||  used   |  hard
--------------|------||------------|------------||---------|---------
 panteater_lab|012345||   38.36 TiB|   40.00 TiB||  1310459|unlimited  # see 4
   alpha_users|158537||      0 byte|    1   byte||        0|        1  # see 5
     panteater|000865||      0 byte|    1   byte||        0|        1  # see 5
  1. The above shows that user panteater can write in its group allocation on dfs9 only if using UNIX group panteater_lab for which there is 40Tb allocation. Note, the allocated space 40Tb and the used space 38.36Tb are totals by all users allowed to write in this area.

  2. There is 0 quota (shown as 1 byte) for a personal UNIX group panteater or a supplemental UNIX group alpha_users. If a user tries to write using these UNIX groups it will result in permissions and over the quota errors.

Over quotas

When quota is filled, the users will not be able to write any files or directories and submitted jobs will fail with quota exceeded errors

Quota is enforced by the file system based upon the Unix group membership of a particular file. For example,

$ ls -l
total 55524423
drwxrwsr-x  7 panteater bio                 7 Aug  5  2019 biofiles
-rw-r--r--  1 panteater panteater  4294967296 May 31  2019 performance.tst
drwxrwsr-x  3 panteater panteater           2 Oct  8 17:11 myfiles

The user panteater is storing files under two different groups:

  • the files in the subdirectory biofiles are charged to the bio group quota.

  • the file performance.tst and subdirectory myfiles are charged to the panteater group quota

Examine the permissions of the directories: drwxrwsr-x. Notice the s for the group execute permissions (character positions 5-7). This is called the sticky bit for the directory. It is subtle, but important difference: x instead of s in the group execute permission. Compare to permissions without sticky bit:

trial words

Sticky bit

Directory mode

Description

is set

drwxrwsr-x

In the origin directory, created files and directories are written with the group permissions rws of the origin directory. The sticky bit s is set.

is NOT set

drwxrwxr-x

In the origin directory, created files and directories are written with the active UNIX group permissions rwx of the origin directory, which defaults to your login.

The Unix command newgrp can be used to change the active Unix group.

For example, the user panteater by default has a group panteater. The following sequence of simple commands shows the ownership of the files created under different groups and shows how to use newgrp command.

$ id panteater
uid=1234567(panteater) gid=1234567(panteater) groups=1234567(panteater),158571(bio)
$ touch aaa
$ ls -l aaa
-rw-rw-r-- 1 panteater panteater 0 Nov  3 14:57 aaa

$ newgrp bio
$ touch bbb
$ ls -l bbb
-rw-rw-r-- 1 panteater bio 0 Nov  3 14:57 bbb

Please type man newgrp to learn about this command.

Reasons for Over Quota
  1. Under normal operation, when the sticky bit is set on a directory, the correct quota enforcement occurs automatically because files and subdirectories are written with correct group, no newgrp command is needed. When all space is used over quota is issued.

  2. The most common quota problems on DFS result from:

    • inadvertently removing the sticky bit on a directory and then writing with the default personal group.

    • changing the group ownership of a file or directory and then trying to write to it with the default personal group.

    In these cases writing files and running jobs can fail.

  3. Moving data to HPC3 with software that overrides the sticky bit by explicitly setting permissions in the most common way a sticky bit becomes unset.

    Note

    Please see Data transfer for information how to move data to the cluster.

Fix over quotas

Fixing Permissions

You can use the chmod command to fix directories that don’t have a sticky bit set, but should have. The following command will add the sticky bit to a particular directory.

$ chmod g+s directory-name

You can use the find command to find all directories in a subtree and combine it with chmod command to set the sticky bit on all found directories:

$ find . -type d -exec chmod g+s {} \; -print
Fixing Group Ownership

You can also use the chgrp and chown commands to change the group ownership of a file or directory. For example, to change the group from panteater to bio on a specific file or directory:

$ ls -l
total 55524423
drwxrwsr-x  7 panteater bio                 7 Aug  5  2019 biofiles
-rw-r--r--  1 panteater panteater  4294967296 May 31  2019 performance.tst
drwxrwsr-x  3 panteater panteater           2 Oct  8 17:11 myfiles

$ chgrp bio performance.txt
$ chown -R panteater:bio myfiles
$ ls -l
total 55524423
drwxrwsr-x  7 panteater bio                 7 Aug  5  2019 biofiles
-rw-r--r--  1 panteater bio        4294967296 May 31  2019 performance.tst
drwxrwsr-x  3 panteater bio                 2 Oct  8 17:11 myfiles

The ls -l command is used to show permissions before and after the change.

Selective Backup

We cannot backup everything on the cluster. Selective Backup allows the users to choose what is important and have it automatically saved. The physical location of the backup server is different from the cluster location for extra protection.

Note

You will want to backup only critical data such as scripts, programs, etc.

Warning

DO NOT backup data you can get from other sources, especially large data-sets.

Important

If you go past your backup quota then backups stops for your account. The backup will fail as no new data can be written to the backup server since you reached your limit.

Default settings

The Selective Backup is based on rsync in conjunction with GNU Parallel. The combination maximizes the network throughput and server capabilities in order to backup hundreds of user accounts from multiple public and private filesystems.

The Selective Backup process will automatically start saving your home directory as well as some public and private disk spaces.

Note

For a majority of users defauls are sufficient.
There is nothing for you to do if you like the defaults.

Users manage their Selective Backup via two control files located in their $HOME directory:

  1. .hpc-selective-backup This file lists (1) backup options and the (2) files/directories names to be saved in order of priority from the most to the least important. All backup options are initially commented out.

    The files are backed in the order as they are listed. That way, if a user runs out of selective disk quota before all listed files have been backed up, at least their most prized data are saved. By default, this file contains $HOME and /pub areas of your account:

    /data/homezvolX/ucinetid
    /pub/ucinetid
    

    The following table lists all available backup options:

    Selective Bakup Option

    What it does

    HPC_SEND_EMAIL_SUMMARY

    Sends you daily email summaries of your saves. Default is NO summary email notifications.

    HPC_SEND_EMAIL_ON_ERROR

    You will receive an email only if rsync completes with an error. Error being non-zero exit status from rsync. Consult the man rsync page for error values and meaning. Default is NO email notifications.

    HPC_KEEP_DELETED=X

    Keep deleted files on the backup server for X days where X is a number in 0-90 range. Deleted files are files you removed from the source location. Default is 14 days.

  2. .hpc-selective-backup-exclude This file lists file/directories names you want to exclude from backup. By default, this file excludes ZFS snapshots from $HOME:

    $HOME/.zfs
    

    For more information on rsync exclude patterns please see the “ANCHORING INCLUDE/EXCLUDE PATTERNS” section of man rsync command output.

Custom settings

To customize, edit control files with your favorite editor. We highly recommend the following:

  1. request email notifications to make sure things are working

    Choose one of two SEND_EMAIL options in .hpc-selective-backup file and uncomment it (remove the # sign at the beginning of the line). For example, if you choose to receive email notifications in the event of errors, edit your configuration file and change the line:

    # HPC_SEND_EMAIL_ON_ERROR
    

    to:

    HPC_SEND_EMAIL_ON_ERROR
    
  2. perform some spot checks of what you think is being saved to make sure your data is indeed being backed-up.

Where backups are

A user can access backup files on the login nodes of the cluster from the following paths:

Where

What

/sbak/zvolX/backups/ucinetid/data/homezvolX/ucinetid

user $HOME

/sbak/zvolX/backups/ucinetid/pub/ucinetid

/pub/$USER/

/sbak/zvolX/backups/ucinetid/DELETED-FILES

deleted files by date (counts towards backup quota)

/sbak/zvolX/logs/$DATE/ucinetid

backup logs by date, available for the past Y days

Note

The X in /sbak/zvolX maps to the volume number shown in your $HOME variable. In other words, the mapping is:
/data/homezvol0 -> /sbak/zvol0/backups
/data/homezvol1 -> /sbak/zvol1/backups
/data/homezvol2 -> /sbak/zvol2/backups
/data/homezvol3 -> /sbak/zvol3/backups
The number of days Y is defined by HPC_KEEP_DELETED=Y in your .hpc-selective-backup

Deleted Files Recovery

Note

Deleted files and directories can be recovered provided they exist in the selective backup.
You have to be on a login node to access backup files.

Below is a general procedure for user panteater to restore accidentally deleted from /pub/panteater directory spring-2022 and files in it.

$ cd /sbak/zvol0/backups/panteater/DELETED-FILES                  # see 1
$ find . -type d -name spring-2022                                # see 2
./2024-0214/pub/panteater/spring-2022
./2024-0213/pub/panteater/spring-2022

$ ls ./2024-0214/pub/panteater/spring-2022/                       # see 3
schedule1    schedule1.sub   slurm.template

$ cp -p -r ./2024-0214/pub/panteater/spring-2022 /pub/panteater   # see 4

The above commands mean:

  1. The cd command puts you at the top level of a backup directory for your files.

  2. The find command finds all backups by date where the desired directory exists. Here, two snapshots are found by date: 2024-0214 and 2024-0213.

  3. Run ls command for the specific snapshot to see if it has needed files.

  4. If needed files exists in the backup, user can use cp command to copy the files back to the pub directory. It is recommended to use -p and -r options. Option -p makes sure that copy command preserves the time stamp and the ownership of a file. Option -r means “copy recursively”, this is needed when copying a directory and its contents.

One can restore in a similar way files and directories deleted from $HOME.