CRSP is a reliable and resilient network-based multi-Petabyte storage cluster
for the UCI campus research community to store and share their research data.
CRSP is funded through central campus to guarantee a fixed amount of no-cost storage to any
faculty member or staff researcher who requests space. These campus funds pay for the people,
the baseline infrastructure, and vendor maintenance required to provide the robust infrastructure.
While there are many possible use cases, a driving one is:
A faculty researcher and the data needed to support the research lab, graduate students
and postdocs. In this model, a Lab Area is created on CRSP and is logically owned by the
researcher. The Lab Area Owner can specify additional users who have read/write capability
on the area and how much space each could consume.
Warning
CRSP filesystems must not be used to store personally-identifiable information that
would fall under guidelines such as FERPA
(Student data) and HIPAA (health-care data).
If you are unsure that CRSP is suitable for your data, please refer to general guidance for
data security
provided by the UCI Office of Research
CRSP is a combination of several technologies
It is built with industry leading storage technology to ensure data high availability and resiliency.
It is multi-site and is comprised of commodity server components from Dell for cost-effective scaling
and performance.
The underlying parallel file system is GPFS (a.k.a IBM Spectrum Scale)
Additional support and integration software from Kalray.
Features include
CRSP is only available on the UCI Network or through the campus VPN,
An active-active storage system setup between two hosting locations for high availability and redundancy,
with fully fault tolerant high speed networking.
End-to-End 24x7 software and hardware support.
A fully encrypted file system, featuring
encryption at REST
ensuring user data security.
A file system design that has massive scaling capabilities without compromising performance.
A front-end access layer design that is capable of scaling horizontally as demand grows.
Several user access methods are in place, with enterprise level support.
All access mechanisms are fully load-balanced between data centers.
User access methods for data storing and retrieval assure a superior level of user experience:
web browsers
folders on laptops or desktops
file path on UCI’s High-performance computing clusters (HPC3).
CRSP to CRSP2 Transition
On July 16, 2024 CRSP underwent a complete hardware upgrade to replace end-of-life hardware and expand capacity.
After the upgrade:
Change
What to do
All active user/lab files were copied from CRSP
to its replacement (CRSP2)
If you actively access CRSP now, you will be able to do so after the upgrade.
Your files will be in the same location as they were prior to upgrade.
DUO Multifactor Authentication is required for ALL
desktop clients
Please see Using DUO with CRSP for using SSH-keys with strong
passwords. Once you have set up key-based authentication, you need to re-configure
your CRSP Desktop Client Bookmark to use
your ssh key instead of your password.
The Secure Copy interface (scp)
is no longer available
I am a student/psotdoc/researcher and I want to access my PI lab:
You should send a request and include:
your UCInetID
your PI’s UCInetID or existing CRSP lab name
You must cc your request to the PI.
Once the ticket is generated (you receive an automated email response)
the PI will have to respond to the cc with a confirmation.
We will not create an account without your PI’s confirmation.
CRSP is funded through central campus to guarantee a fixed amount of no-cost storage to any PI
who requests space. These campus funds pay for the people, the baseline infrastructure,
and vendor maintenance required to provide the robust infrastructure.
PI is a ladder-rank faculty or a researcher who has an exception granted to act as PI on federal grants
by UCI Office of research.
Each CRSP allocation is associated with a UCI’s PI account and is provided as follows:
A directory meaning on Unix is equivalent to a folder on macOS or Windows.
In what follows, we will use the term file to mean
file, folder, or directory.
This allocation space, called Lab, is a shared space area per Space Owner.
The Lab areas provide the most flexibility for access control and sharing:
The Allocation quota is for the whole Lab area allocation and is a sum of what is stored
in share and in all personal directories.
Each Grantee has a personal directory (named with grantee’s UCInetID). Only
grantee and the Space Owner can read/write files in this directory.
A directory called share is available to all members of the lab.
Anyone in the lab can read/write files stored under it.
The Space Owner
grants explicit access for this area to Grantees and decides how to allocate
the space among its group members and can place limits on individuals Grantees.
has the ability to create files or new directories in the top-level of the Lab area.
by default has read access to every file and directory in the Lab area.
CRSP has many (and sometimes competing) goals for access, sharing, security,
manageability, and simplicity for researchers. One of the technical complexities
of CRSP is that the underlying file system and access enforcement mechanisms are
defined in Unix, but most users access is from Mac and Windows environments.
On Unix an independent access controls to all files given to three different entities:
The owner of the file. This is the UCInetID that originally created the file
The group of the file. A group who might have access to this file
The world (or others). Everyone else on CRSP
Important
In CRSP Lab areas sharing is controlled by group permissions
and by who is a member of the particular group. The world has no privilege
to read or write files in any Lab area.
File owners files can make files explicitly private by
removing read/write group permissions.
For each Lab area, the PI is the owner of the space.
There are two Unix groups predefined for all labs:
pi_lab: only the lab owner is in this group
pi_lab_share: all members of the lab including the lab owner.
Example Lab
In the following, we will use the Lab for a PI ppapadop:
ppapadop is in the group ppapadop_lab and is only member of this group.
ppapadop is in the group ppapadop_lab_share.
ckhacher, itoufiqu, tandriol, iychang are in the group ppapadop_lab_share.
They are Lab members (grantees) that were given an access to the Lab area by the PI.
Fig. 4.10 Example Lab top-level folder (using MAC CRSP Desktop)
This shows that for the ppapadop Lab on CRSP:
User ppapadop who is a PI can see all files anywhere in the Lab area.
All Lab members can read/write files in share area.
Lab members in ppapadop_lab_share group are: ppapadop, ckhacher, itoufiqu, tandriol, iychang.
Each Lab member has a folder named by UCInetID that is private to the
user and to the PI.
only ppapadop and itoufiqu users can access files in the itoufiqu folder.
only ppapadop and ckhacher users can access files in the ckhacher folder.
similar access for the remaining Lab members’ folders.
You must either be on the campus network or connected to the
UCI campus VPN to access CRSP.
Your login credentials for all access methods described below are:
login name:
your UCInetID
password:
your password associated with your UCInetID
We do not set or change passwords.
You can access your granted CRSP storage from Windows, MAC, and Linux systems
via a few methods. The links in the table below provide installation
instructions:
CRSP Desktop clients are for accessing CRSP from Windows and macOS laptops.
We provide licensed and branded version of a commercial software Mountain Duck.
This is for light weight CRSP resource usage, supports file or directory
uploads/downloads and provides in-browser edit capabilities for certain file types.
NFS mount provides and access to the CRSP’s Lab and HOME areas from HPC3.
Attention
The CRSP Desktop client is the currently supported SFTP based software.
Although CRSP storage system could be accessed via other
desktop clients such as FileZilla, WinSCP, CyberDuck, their
support is provided only on a best effort basis.
All members of the group contribute to the quota in group area.
It’s the sum total usage that counts.
Users with access to PI’s lab areas
may have separate quota limits set by their PIs.
Quotas can be exceeded in number of files, total space used or both.
When quotas are exceeded, all group users can no longer write in the affected
filesystem and will need to remove some files and directories to free space.
Once successful you will see a simple text page with quotas for HOME and Lab areas.
When done, close the browser tab/window, there is no logout from this page.
When logged on HPC3:
The CRSP quota info is updated on a regular basis and is put in your $HOME area on CRSP
in the file /share/crsp/home/UCInetID/quotas.txt. For example, for a user panteater:
The ls command gives an idea when the file was updated:
[user@login-x:~]$ cat/share/crsp/home/panteater/quotas.txt
Quota Report for panteater : 06/12/23 17:30== Storage Areas that you own == (see a)== Your use in Paths to which you have access == /mmfs1/crsp/home 0.001 GB/ 0.020 GB 6/40 files (see b) total bytes in use : 115.735 GB/ 0.000 GB /mmfs1/crsp/lab/UCInetID-pi 39.799 GB/ 1024.000 GB 2900/100000 files (see c) total bytes in use : 374.092 GB/ 1024.000 GB
From the above output, the user panteater:
Does not own any area (user is not a PI).
Has no usage in HOME area /mmfs1/crsp/home, this is a correct behavior.
The 0.001 GB is used only by account related files. Currently the user
used 6 out of 40 files (40 is a quota).
Is a member of UCInetID-PI Lab and used 39.799 GB of the allocated 1024 GB Lab area
in /mmfs1/crsp/lab/UCInetID-pi and 2900 files (quota 100000).
The total usage of the Lab area by all lab members is 374.092 GB.
The path naming correspondence between CRSP and HPC3:
When quota is filled either in used space or in number of files, the users will not be able to write any files
or directories and submitted jobs will fail with quota exceeded errors.
For example, the following output of quotas check show the quotas exceeded for
the user panteater in number of files (a) in storage used (b):
mmfs1/crsp/home 0.014 GB/ 0.020 GB 40/40 files (a)
total bytes in use : 115.735 GB/ 0.000 GB
mmfs1/crsp/lab/UCInetID-pi 1029.799 GB/ 1024.000 GB 2900/100000 files (b)
total bytes in use : 1029.799 GB/ 1024.000 GB
Form now on:
if panteater is trying to connect to CRSP using Desktop CRSP client, the connection will fail
if any other user in the lab trying to write in Shared Lab area, there will be a quota error.
The number of files quotas are reasonably set at the time of the account
creation. When the quota is exceeded we recommend that users:
Check what they wrote and remove any temporary files.
Use tar or zip commands to create single files from the directories containing many small files
and remove original small files. Compressed files use less space.
files number quota exceeding in $HOME is usually related to temporary files created
by Jupyter for each web-based access session.
While logged in on HPC3, check how many such files you have and remove older ones:
ls -l /share/crsp/home/panteater/.local/share/jupyter/runtime/total 1024-rw-rw---- 1 panteater panteater 254 Jan 30 14:41 nbserver-114022.json-rw-rw---- 1 panteater panteater 562 Jan 30 14:41 nbserver-114022-open.html-rw-rw---- 1 panteater panteater 255 Mar 14 2022 nbserver-3966545.json-rw-rw---- 1 panteater panteater 562 Mar 14 2022 nbserver-3966545-open.html... cut lines ...rm /share/crsp/home/panteater/.local/share/jupyter/runtime/nbserver-3966545*
Note
If you only use web-based access for your CRSP lab space and never
login on HPC3 you will need to submit a ticket asking us to remove such files.
Fix space quota
Usually quota violations happen when:
Users fill space over quota. Either reduce your usage (remove or compress some files)
or buy additional space (see Allocations).
Users run rsync or scp to transfer the files that results in wrong permissions.
Please see fix DFS over quota
section for info how to find files with wrong group permission and how to fix them.
The only difference for CRSP is a path to the written files.
A snapshot of a file system is a logical, point-in-time, read-only, copy of all files in a given CRSP file system.
It’s not really a complete copy. Instead, the file system keeps track of files that are changed
or deleted after the snapshot was made.
Default settings
All snapshots are read-only, you cannot delete a file from a snapshot.
All snapshots are labeled by date and time, the timezone is GMT (Greenwich Mean Time).
The names look like @GMT-YYYY.MM.DD-hh.mm.ss.
Snapshots are taken daily and kept for 89 days.
Files that were deleted/changed more than 90 days ago are gone forever.
Restoring a file from a snapshot is as simple as copying the file back to your desired location.
Each Lab has its own .snapshots directory.
Snapshots for the home area are kept in one place for ALL users.
Is Snapshot a Backup?
Almost. Backups are generally thought of as historical copies of files to an offsite location.
In a traditional backup, users could go back in time months or years to recover a file.
A snapshot is a point-in-time virtual copy of a filesystem that is kept on the filesystem itself.
Snapshots:
provide some safety against the common I accidentally deleted it case.
Snapshots allow you self-service restore of files/folders that you have recently deleted or overwritten.
Files created and deleted in the same time interval between two snapshots are not recorded in any
snapshot and have no recovery.
Offsite backups:
protect against total failure of CRSP itself (highly unlikely).
CRSP does not keep historical backups of data. But, there is an offsite copy of all CRSP data. In essence, every file
in CRSP has three copies - two (one in each sub cluster) in Irvine and one (off site) in San Diego.
There are three ways to recover your data that was stored in the Lab area.
Using HPC3
Located at the top-level of your lab directory is the .snapshots directory.
This directory is owned by the root user and cannot be changed by any user.
Navigate to the .snapshots directory, where you will see directories that
have names in the format @GMT-YYYY.MM.DD-hh.mm.ss. This encoding
indicates date and time when the snapshot was taken. For
the lab ppapadop, on HPC3 you would find the ppapadop snapshots as below:
Check snapshots for the presence of desired files at the desired time stamp.
Once a good snapshot is identified, copy files or folders that you want to restore from
the snapshot back to the area where you want the file so that you can access it normally.
Using the CRSP Desktop
On a Mac, the .snapshots folder is hidden by default.
See Mac connect share section for a reference how
to view hidden folders in the Finder.
Click on the .snapshots folder at the top level of your already-configured lab share:
Fig. 4.11 .snapshots folder at the top-level of the lab
You will see a set folders (tip: sort by name), that have the date and time when each snapshot was taken:
Find the date of interest, and then download the files/folders to your local system
Restoring $HOME Data
Snapshots for the home area are kept in one place for ALL users.
Since $HOME areas usually don’t contain significant data, it can be a more straightforward
to use the Web Interface.
Using HPC3
You can see all the home snapshots in /share/crsp/home/.snapshots.
They will have naming format @GMT-YYYY.MM.DD-hh.mm.ss.
You can navigate into one of these snapshot directories and you will see all user
home areas names. You will only have permission to further descend into your home area.
Once a good snapshot is found, just copy files or folders that you want to restore from the snapshot back to $HOME.