4. Data transfer
If you need to bring some data from your laptop or another host to the
cluster you will mainly want to use scp
(there is an equivalent for
Windows) or rsync
commands.
Important
You will have to use correct command-line parameters to ensure that the data transfer program you use will respect the sticky bit and not cause quota issues.
4.1. Using scp
The scp
is a secure file transfer protocol. It allows one to connect to a remote server
and transmit desired files via the connection.
Danger
When files are transferred the destination sticky bits on directories are not inherited.
This is not a problem if the users are copying files to /pub/UCInetID
This is a problem when copying to /dfsX/group-lab-path area and it usually results in quota exceeded errors.
There are 2 ways to deal with this.
- Scenario 1
Transfer needed files (using recursive directives if needed). For example, a user has an access to a group allocation /dfsX/panteater_lab/user1 and want to transfer data there.
On your laptop or other server run
scp
command:$ scp -r mydata panteater@hpc3.rcic.uci.edu:/dfsX/panteater_lab/user1
On HPC3 check the permissions on the transferred directory:
$ ls -l /dfsX/panteater_lab/user1 total 138 drwxr-xr-x 6 user1 panteater_lab 18 Feb 18 13:10 mydata
Note, the permissions drwxr-xr-x are missing s (sticky bit is not set) and this means all subdirectories under mydata are also missing it. Will need to fix the permissions on mydata:
$ chmod g+s /dfsX/panteater_lab/user1/mydata*
Similarly, repeat
chmod
on all subdirectories under it.- Scenario 2
This requires less work and is more accurate.
On your laptop (or remote server) create a compressed tar file of the files you want to transfer and then scp this compressed file:
$ tar czvf mydata.tar.gz mydata $ scp -r mydata.tar.gz panteater@hpc3.rcic.uci.edu:/dfsX/panteater_lab/user1
On the cluster, uncompress transferred file and check permissions:
$ cd /dfsX/panteater_lab/user1 $ tar xzf mydata.tar.gz $ ls -l total 138 drwxr-sr-x 6 user1 panteater_lab 18 Feb 18 13:12 mydata $ ls -l mydata total 124 -rw-r--r-- 1 user1 panteater_lab 17075 Jul 21 2020 desc.cvs -rwxr-xr-x 1 user1 panteater_lab 7542 Jul 21 2020 README drwxr-sr-x 2 user1 panteater_lab 4 Feb 18 12:03 common drwxr-sr-x 2 user1 panteater_lab 3 Feb 18 12:03 images
Note, the permissions drwxr-sr-x on mydata include s and all directories under mydata inherited it. Delete transferred mydata.tar.gz after verification.
4.2. Using rsync
The rsync
is a program that allows to greatly speed up file transfers.
See man rsync
for more information and options to use.
There are two options in rsync
command that will overwrite the destination
permissions and it is a common issue that the users encounter when transferring data:
-p, --perms
preserve permissions-a, --archive
archive mode; same as-rlptgoD
, implies-p
Important
When
-p
option is used,rsync
preserves the permissions of the source and this is not correct for the destination server where files and directories need to have very specific user:group permissions.Avoid using
-p
and-a
options when runningrsync
commands.
For example, for a recursive copy of a local directory and to show a verbose output one can use:
$ rsync -rv mydata panteater@hpc3.rcic.uci.edu:/dfsX/panteater_lab/user1
4.3. Using Aspera
There is no installation of Aspera cluster-wide as the Aspera client needs to be installed by the user in a user-writable area.
Download
You will need to download and install Aspera Connect software from: https://www.ibm.com/aspera/connect/. Copy the URL for Linux on the download page and paste into
wget
command to download:$ wget https://d3gcli72yxqn2z.cloudfront.net/downloads/connect/latest/bin/ibm-aspera-connect_4.2.8.540_linux_x86_64.tar.gz
Per above, a file is saved as ibm-aspera-connect_4.2.8.540_linux_x86_64.tar.gz. Note, available version for this example is 4.2.8.540, and it will change when a new version becomes available.
Install
Use the correct version number from your download in the following commands:
$ tar -zxvf ibm-aspera-connect-VERSION_linux_x86_64.tar.gz $ ./ibm-aspera-connect-VERSION_linux_x86_64.sh
This will result in creating $HOME/.aspera/connect directory which will have all needed components of the Aspera Connect client as far as compiled binary, certificates, etc.
Use
Sites that require using Aspera Client for upload/download usually provide specific instructions how to connect to their Aspera servers.
The following example shows a download of a fastq file from a remote server to a local directory dir1. Command is broken with \ into multiple lines for readability:
$ $HOME/.aspera/connect/bin/ascp \ -v \ -P33001 \ -i $HOME/.aspera/connect/etc/asperaweb_id_dsa.openssh \ era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR179/003/SRR1798143/SRR1798143.fastq.gz dir1/
-v
use verbose mode-P33001
is the initial TCP connect port. Your server may need other port identified. We have network settings to allow such high numbered ports to be opened for the transfer.-i
is the private key file created during the install.
Any other flags will depend on the Aspera server setup. For additional help on usage:
$ $HOME/.aspera/connect/bin/ascp -h