4. Data transfer
If you need to bring some data from your laptop or another host to the
cluster you will mainly need to use scp
(there is an equivalent for
Windows) or rsync
commands.
You will need to give extra command-line parameters to ensure that the data transfer program you use will respect the sticky bit and not cause quota issues.
4.1. Using scp
The scp
is a secure file transfer protocol. Scp allows one to connect to a remote server
and transmit desired files via the connection.
Danger
When files are transferred the destination sticky bits on directories are not inherited.
This is not a problem if the users are copying files to /pub/ucinetid
This is a problem when copying to /dfsX/group-lab-path area and it usually results in quota exceeded errors.
There are 2 ways to deal with this.
- Scenario 1
Scp needed files (using recursive directives if needed). For example, a user has an access to a group allocation /dfsX/panteater_lab/panteater and want to transfer data there.
On your laptop or other server run
scp
command:$ scp -r mydata panteater@hpc3.rcic.uci.edu:/dfsX/panteater_lab/panteater
On HPC3 check the permissions on the transferred directory:
$ ls -l /dfsX/panteater_lab/panteater total 138 drwxr-xr-x 6 panteater panteater_lab 18 Feb 18 13:10 mydata
Note, the permissions drwxr-xr-x are missing s (sticky bit is not set) and this means all subdirectories under mydata are also missing it. Will need to fix the permissions on mydata:
$ chmod g+s /dfsX/panteater_lab/panteater/mydata*
Similarly, repeat
chmod
on all subdirectories under it.- Scenario 2
This requires less work and is more accurate.
On your laptop (or remote server) create a compressed tar file of the files you want to transfer and then scp this compressed file:
$ tar czvf mydata.tar.gz mydata $ scp -r mydata.tar.gz panteater@hpc3.rcic.uci.edu:/dfsX/panteater_lab/panteater
On the cluster, uncompress transferred file and check permissions:
$ cd /dfsX/panteater_lab/panteater $ tar xzf mydata.tar.gz $ ls -l total 138 drwxr-sr-x 6 panteater panteater_lab 18 Feb 18 13:12 mydata $ ls -l mydata total 124 -rw-r--r-- 1 panteater panteater_lab 17075 Jul 21 2020 desc.cvs -rwxr-xr-x 1 panteater panteater_lab 7542 Jul 21 2020 README drwxr-sr-x 2 panteater panteater_lab 4 Feb 18 12:03 common drwxr-sr-x 2 panteater panteater_lab 3 Feb 18 12:03 images
Note, the permissions drwxr-sr-x on mydata include s and all directories under mydata inherited it. Delete transferred mydata.tar.gz after verification.
4.2. Using rsync
The rsync
is a program that allows to greatly speed up file transfers.
See man rsync
for more information and options to use.
There are two options in rsync
command that will overwrite the destination
permissions and it is a common issue that the users encounter when transferring data:
-p, --perms
preserve permissions-a, --archive
archive mode; same as-rlptgoD
, implies-p
Important
When -p
option is used, rsync
preserves the permissions of the source and
this is not correct for the files and directories in destination that need to comply with
user:group permissions.
Avoid using -p
and -a
options when running rsync
commands.
For example, for a recursive copy of a local directory and to show a verbose output one can use:
$ rsync -rv mydata panteater@hpc3.rcic.uci.edu:/dfsX/panteater_lab/panteater
4.3. Using Aspera
There is no installation of Aspera cluster-wide as the Aspera client needs to be installed by the user in a user-writeable area.
Download
You will need to download and install Aspera Connect software from: https://www.ibm.com/aspera/connect/. Copy the URL for Linux on the download page and paste into
wget
command to download:$ wget https://d3gcli72yxqn2z.cloudfront.net/downloads/connect/latest/bin/ibm-aspera-connect_4.2.8.540_linux_x86_64.tar.gz
Per above, a file is saved as ibm-aspera-connect_4.2.8.540_linux_x86_64.tar.gz. Note, available version for this example download is 4.2.8.540 and will differ when new version becomes available.
Install
Use the correct version number from your download in the following commands
$ tar -zxvf ibm-aspera-connect-VERSION_linux_x86_64.tar.gz $ ./ibm-aspera-connect-VERSION_linux_x86_64.sh
This will result in creating $HOME/.aspera/connect directory which will have all needed components of the Aspera Connect client as far as compiled binary, certificates, etc.
Use
Sites that require using Aspera Client for upload/download usually provide specific instructions how to connect to their Aspera servers.
The following example shows a download of a fastq file from a remote server to a local directory dir1. Command is broken with \ into multiple lines for readability):
$ $HOME/.aspera/connect/bin/ascp \ -v \ -P33001 \ -i $HOME/.aspera/connect/etc/asperaweb_id_dsa.openssh \ era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR179/003/SRR1798143/SRR1798143.fastq.gz dir1/
-v
use verbose mode-P33001
is the initial TCP connect port. Your server may need other port identified. We have network settings to allow such high numbered ports to be opened for the transfer.-i
is the private key file created during the install.
Any other flags will depend on the Aspera server setup. For additional help on usage:
$ $HOME/.aspera/connect/bin/ascp -h