CRSP
Overview
CRSP is a network-based multi-Petabyte storage cluster for the UCI campus research community. It is a reliable and resilient location created for researchers across UCI to store and share their research data.
CRSP is available across the network, it supports multiple modes of storing and retrieving data including web browsers, “folders” on laptops or desktops, and just another file path on UCI’s High-performance computing clusters.
- While there are many possible use cases, a driving one is:
A faculty researcher and the data needed to support the research lab, graduate students and postdocs. In this model, a lab “area” is created on CRSP and is logically owned by the researcher. The lab area owner can specify additional users who have read/write capability on the area and how much space each could consume
Warning
CRSP filesystems must not be used to store personally-identifiable information that would fall under guidelines such as FERPA (e.g. Student data) and HIPAA (health-care data).
If you are unsure if DFS is suitable for your data, please refer to general guidance for data security provided by the UCI Office of Research
CRSP technologies
- CRSP is a combination of several technologies
It is built with industry leading storage technology to ensure data high availability and resiliency.
It is multi-site and is comprised of commodity server components from Dell for cost-effective scaling and performance.
The underlying parallel file system is GPFS
Additional support and integration software from Arcastream.
- Features include
An active-active storage system setup between two hosting locations for high availability and redundancy, with fully fault tolerant high speed networking.
End-to-End 24x7 software and hardware support.
A fully encrypted file system, featuring encryption at REST ensuring user data security.
Several user access methods are in place, with enterprise level support. All access mechanisms are fully load-balanced between data centers.
A file system design that has massive scaling capabilities without compromising performance.
A front-end access layer design that is capable of scaling horizontally as demand grows.
Multiple user access methods, assuring a superior level of user experience.
Allocations
CRSP is funded through central campus to guarantee a fixed amount of no-cost storage to any faculty member or staff researcher who requests space. These campus funds pay for the people, the baseline infrastructure, and vendor maintenance required to provide the robust infrastructure.
CRSP allocations are provided for UCI faculty members as follows:
- No cost baseline allocation
1TB quota per researcher
- Recharge allocation - Lab area
Researchers who require more capacity than the baseline allocation, can purchase additional capacity. Please see Storage Related Recharges and Purchase CRSP storage
In general, users do not get a default CRSP allocation. The allocation owners can grant access to their spaces to students, postdocs, and other faculty members.
The allocation is associated with an account.
Getting CRSP Account
All requests described below must be sent to hpc-support@uci.edu
I’m a researcher on campus and I want to have an access If you are a ladder-rank faculty or have an exception granted to act as PI on federal grants by UCI Office of research, your account should be pre-created. If you still do not have access, please send us a request.
I’m a researcher and I want to access my colleagues lab Your colleague must send a request and ask for access for you to their lab.
I’m a researcher and I want colleagues outside of UCI to have access to my lab You must first sponsor a UCINetID (see Access) then send a request to grant access.
I’m a researcher and I want to add students/postdocs to my lab You should send a request and include:
your existing CRSP lab name
UCINetIDs and names of the people that you want to add
- indicate any of these people should have individual limits
and what the limits are. The default behavior is no individual limit.
I’m a student/postdoc Your PI should send a request and include:
your UCINetID
indicate if your space should have an individual limit.
PI may combine multiple requests in a single email.
Accessing CRSP
You must either be on the campus network or connected to the UCI campus VPN to access CRSP.
You can access your granted CRSP storage from Windows, MAC, and Linux systems via a few methods. The client links in the table below provide installation instructions:
Client |
Description |
CRSP Desktop clients are for accessing CRSP from Windows and macOS laptops. We provide licensed and branded version of a commercial software Mountain Duck. |
|
This access is used for light weight CRSP resource usage, supports file or direvtory uploads/downloads and provides in-browser edit capabilities for certain file types. |
|
SSHFS can be used for accessing CRSP shares from a Linux laptop/desktop. |
|
NFS mount on HPC3 provides and access to the CRSP’s LAB and HOME areas. |
Attention
Although CRSP storage system could be accessed via other commercial or open source desktop clients such as FileZilla, WinSCP, CyberDuck, the CRSP Desktop client is the currently supported SFTP based software. Other desktop clients support is provided only on a best effort basis.
Consult our CRSP Troubleshooting if you have trouble accessing your CRSP shares.
Quotas
There are two ways to check your quotas:
Using a web browser go to the https://access.crsp.uci.edu/quota You will be asked to authenticate yourself (DUO) and once successful you will see a simple text page indicating your quotas for HOME and LAB areas.
When you are logged on HPC3 you can simply view your CRSP quota. File /share/crsp/home/USERNAME/quotas.txt in your CRSP HOME area provides quotas info:
[user@login-x:~]$ ls -ld /share/crsp/home/panteater drwx-----T 7 panteater panteater 2048 May 10 15:28 /share/crsp/home/panteater [user@login-x:~]$ cat /share/crsp/home/panteater/quotas.txt Quota Report for panteater : 06/12/23 17:30 == Storage Areas that you own == == Your use in Paths to which you have access == /mmfs1/crsp/home 0.001 GB/ 0.020 GB 6/40 files total bytes in use : 115.735 GB/ 0.000 GB /mmfs1/crsp/lab/ucinetid-pi 39.799 GB/ 1024.000 GB 2900/100000 files total bytes in use : 374.092 GB/ 1024.000 GB
The first command above gives an idea when the file was updated. The second command shows that the user panteater:
does not own any area (user is not a PI).
has no usage in HOME area /mmfs1/crsp/home, this is a correct behavior. The 0.001 GB is used only by account related files.
is a member of ucinetid-pi LAB and used 39.799 GB of the allocated 1024 GB LAB area in /mmfs1/crsp/lab/ucinetid-pi. The total usage of the LAB area by all lab members is 374.092 GB.
Note the path naming on CRSP and HPC3:
Area
Path on CRSP
Path on HPC3
HOME
/mmfs1/crsp/home
/share/crsp/home
LAB
/mmfs1/crsp/lab/ucinetid-pi
/share/crsp/lab/ucinetid-pi
Note
If you are a PI of the lab you will to see the usage of your lab quota for all lab members.If you are a member of the lab you will see only what you have used from the lab quota allocation.
Snapshots
A snapshot of a file system is a logical, point-in-time, read-only, copy of all files. It’s not really a complete copy. Instead, the file system keeps track of files that are changed or deleted after the snapshot was made. Snapshots are point-in-time copies of the CRSP file system.
Default settings
By definition, all snapshots are read-only, meaning you cannot delete a file from a snapshot. Restoring a file from a snapshot is as simple as copying the file back to your desired directory/folder.
On CRSP, all snapshots are labeled by date and time. The timezone is GMT (Greenwich Mean Time).
Snapshots are taken:
daily, keep last 14
weekly, keep last 8
Attention
Files that were deleted more than 8 weeks ago are gone forever
Is Snapshot a Backup?
Not really. Backups are generally thought of as historical copies of files and users could go to a backup to recover a file from many months ago. Snapshots provide some safety against the common “accidentally deleted” use case. Files created and deleted in the same time interval between two snapshots are not recorded in any snapshot and have no recovery. CRSP does not keep historical backups of data.
Location
Due to the architecture of the underlying filesystem (GPFS) you must first navigate to the top level of the CRSP file system and then navigate downwards to the correct snapshot to find yours.
This means that you will see names of all possible labs or home area folders (and there are 1000s of them on CRSP). Rest assured that only you and those you designate can see any files inside.
Important
All access permissions are fully enforced, even when navigating snapshots.
Each snapshot is a directory that is named after its creation date. The snapshots are held in:
HOME-SNAPSHOTS - directory for HOME area snapshots
LAB-SNAPSHOTS - directory for LAB area snapshots
From HPC3
Top level of the CRSP file system is mounted as /share/crsp thus the snapshots are available in /share/crsp/HOME-SNAPSHOTS and /share/crsp/LAB-SNAPSHOPTS.
For example, a user panteater can find HOME area snapshots as:
[user@login-x:~]$ ls /share/crsp/HOME-SNAPSHOTS @GMT-2021.07.11-10.00.00 @GMT-2021.08.06-01.00.14 @GMT-2021.08.10-13.00.07 @GMT-2021.07.18-10.00.00 @GMT-2021.08.07-01.00.14 @GMT-2021.08.11-01.00.14 @GMT-2021.07.25-10.00.00 @GMT-2021.08.08-01.00.14 @GMT-2021.08.11-13.00.07 @GMT-2021.08.01-10.00.00 @GMT-2021.08.08-10.00.00 @GMT-2021.08.12-01.00.14 @GMT-2021.08.03-01.00.14 @GMT-2021.08.09-01.00.14 @GMT-2021.08.12-13.00.07 @GMT-2021.08.04-01.00.14 @GMT-2021.08.09-13.00.07 @GMT-2021.08.13-01.00.14 @GMT-2021.08.05-01.00.14 @GMT-2021.08.10-01.00.14 @GMT-2021.08.13-13.00.07
And then browse the contents of a specific snapshot using your UCINetID as:
[user@login-x:~]$ ls /share/crsp/HOME-SNAPSHOTS/@GMT-2021.08.08-10.00.00/panteater
From CRSP Desktop
In your CRSP Desktop application connect to the crsp-top-level share connection (it is predefined in the CRSP Desktop installation). See CRSP Desktop App for Windows or CRSP Desktop App for macOS for detailed instructions.
Once at the top level, you will find snapshots labeled by their creation date in the folders labeled HOME-SNAPSHOTS and LAB-SNAPSHOTS.
From web browser
In your Web based File Browser interface navigate to the CRSP top level, you will see a folder structure that is similar to the following:
Snapshots are held in the folders labeled HOME-SNAPSHOTS and LAB-SNAPSHOTS. To find available snapshots for LAB area click on LAB-SNAPSHOTS:
In this example, the most recent snapshot is the last listed. Its name indicates the time stamp when this snapshot was taken: May 05, 2021 at 19:00:01 (GMT). This translates to May 5, 2021 11:00:01 AM (PST). This snapshot contains logical copy of all CRSP lab folders, as they were at that point in time.
Deleted Files Recovery
A common mistake is an accidental file deletion. In many cases, but not all, users can retrieve a previous copy of the file.
If the file you just deleted was created prior to the most-recent snapshot, you can get a copy of the file as it was when the snapshot was created.
Any changes made after the most recent snapshot are lost.
If you wait longer than time specified in Default settings to recover a deleted file, it is gone forever.
The following steps explain how to recover a deleted file from a snapshot using different access methods.
From CRSP Desktop
Use your CRSP Desktop application to connect to the desired share (see CRSP Desktop App for Windows or CRSP Desktop App for macOS for instructions) then use it just like a folder or network drive to copy desired files and folders from a specific snapshot.
From HPC3
One can use usual Unix commands
ls
,cd
,cp
to find and copy desired files and directories from the snapshot to the location where you need to restore them.For example, a user panteater who has an access to peterlab can restore a single file accidentally deleted from its LAB area:
[user@login-x:~]$ cd /share/crsp/lab/peterlab/panteater [user@login-x:~]$ cp /share/crsp/LAB-SNAPSHOTS/@GMT-2021.08.08-10.00.00/peterlab/panteater/important-file important-file
From web browser
In order to recover the file, you must navigate into the File browser top level and File browser LAB-SNAPSHOTS. At this point, find the snapshot (folder) that has a copy of your file.
In the following example the path starts with LAB-SNAPSHOTS / @GMT-2019.5.13-19.00.1, this indicates that we navigated into a specific snapshot @GMT-2019.5.13-19.00.1 in the LAB area. The rest of the path is the desired file module-hpc.log-20201011 location.
Once the desired file is found:
(1) select desired files by checking the box left of the file name(2) click Download to download selected files to your desired writable folder.Selecting files in snapshots
At that point, you have restored from the snapshot your desired files.
You may also copy the file in your usual manner per your host operating system Windows, macOS and Linux.