Overview

This document provides a quick orientation to using the datarmor cluster. It is not really intended as a stand-alone tutorial, but rather as a quick and dirty orientation that can then be complemented following the links provided in the document and by searching for things online.

Accessing Datarmor documentation

Datarmor documentation, predominantly in French, can be accessed via the Portial Domicile using your extranet username and password and then selecting Documentation calculateur. The documentation is also available from inside an Ifremer building here.

Making a connection

If you are inside an Ifremer building, then you can connect directly to datarmor, but if you are outside of an Ifremer building, then you will need to connect in two distinct steps:

  1. Activate PulseSecure VPN using your Ifremer extranet account credentials (i.e., username and password)
  2. Starting an SSH session towards datarmor.ifremer.fr using your normal, non-extranet username and password for datarmor

From inside an Ifremer building, only step #2 is necessary. French instructions for both steps are provided at the documentation links mentioned above. For the first step, you will need to install the PulseSecure VPN client that can be downloaded from Ifremer; for the second step, you will need to install a SSH client. For Windows, the recommended SSH clients are putty (simple, but effective) and MobaXTerm (more modern and feature rich, including FTP and SFTP capabilities for file transfers in addition to SSH).

Basic linux commands

Comand Description Example
history Show previous commands history
exit or ctrl-d Exit a terminal/SSH session exit
mkdir Make a directory mkdir job-2021-10-12
ls List directory contents ls # List current directory
ls -alh # More details, including permissions and file size
ls mydir # List contents of specific directory
cd Change directory cd # Change to home directory
cd test # Change into directory named test
cd ~ # ~ = home directory
cd .. # Go back one level
cd . # . = current directory, so do nothing
rm Remove a file rm test.txt # Remove a single file
rm test*.txt # Remove all files starting with “test” and ending with “.txt”
cp Copy a file or directory cp test.txt newfile.txt
cp -r mydir mynewdir # Copy entire directory tree
pwd Print working directory pwd
cat Show contents of file to terminal cat test.txt
more Show file contents one page at a time. Use q to exit. more test.txt
echo Print text or variables to command line echo "My home directory=${HOME}"
grep Look for a regular expression in text files. Useful for finding words in text files. grep -i error *.txt
qsub, qstat, qdel Commands to submit and manage jobs in PBSPro queuing system See below
conda Manage conda environments (e.g., R or Python) See conda section below
module Load additional functionality into shell session See datarmor documentation & Ichthyop section below
man, info Ways to get help on commands man ls
info ls

For most commands, you can get basic assistance using -h or --help arguments (e.g., ls --help). There are also manuals for most commands that can be accessed with man and/or info (e.g., man ls or info ls).

If you execute a command and either want to stop it or are somehow stuck in it and want to get out, you can try the following steps in order to exit, with each being increasingly more drastic:

  1. Try q or :q<enter> or :qu<enter>. These will get you out of more text paging or things that open in the vi/vim editor.
  2. ctrl-c executed in the offending terminal should kill most active command line instructions.
  3. If the command line seems to be ignoring your commands, try ctrl-q. This can happen after mistakenly typing ctrl-s, which makes the terminal go silent.
  4. The reset command can help a terminal that is printing bizarre characters. This can happen after using cat or more on things like binary files. Even if you do not see it printing reset, type it and hit enter and see what happens.
  5. If all else fails and you cannot exit a command that you have executed, open a new shell terminal and use a combination of ps and kill (or kill -9 if you want to be particularly violent) to kill the specific process that will not stop. This is where you need to be careful not to kill the wrong thing.

Disk space on datarmor

The details of disk space on datarmor are available in the documentation, but briefly most users have 3 spaces that they can store data, listed in order of increasing space and decreasing backups/permanence:

Space Description
DATAHOME Little space, but regularly backed up
DATAWORK Decent amount of space and relatively regular backups
SCRATCH Lots of space, fast, but regularly deleted and no backups

The location of these storage spaces can be determined using, e.g.:

echo $DATAHOME

In essence, DATAHOME is a shell variable that can be accessed by preceding the name of the variable with a $. Use the set command to see all known shell variables.

You can change directories (i.e., cd) into one of these directories using, e.g.:

cd $DATAWORK

Using the queuing system PBSPro / qsub

The best thing for learning the details of how to use qsub is to consult the Datarmor documentation. Here I just provide examples of typical PBS scripts (the files typically submitted to qsub) and a few examples of how to use them.

The table below provides links to a set of standard PBS scripts that I use regularly:

PBS Script Description
run_r_script_array.pbs For running an R script on the cluster, either as a single job or as an array job
run_r_script_parallel.pbs For running an R script using foreach and doParallel
run_ichthyop.pbs For running an array of ichthyop jobs, each defined by a specific configuration file

These scripts should be modified before placing them on the cluster and using them, in particular to adjust the PBS options at the start of the file (for run_r_script_parallel.pbs, make sure to adjust the NPROCS variable to agree with what is in the PBS directives), to determine which, if any, conda environment you will use, and to set any specific directories or files you want to use (e.g., the location of the Ichthyop .jar file).

As an example of the used of these PBS scripts, to run a single R script contained in the file myscript.R, you can do:

qsub -v RSCRIPTNAME="myscript.R" run_r_script_array.pbs

For running the same script, but this time in a configuration with 10 jobs in an array:

qsub -J 1-10 -v RSCRIPTNAME="myscript.R" run_r_script_array.pbs

To run a set of 11 ichthyop configuration files (with names starting with config.xml_) one can do:

bash # Change to a bash shell for generating file list
qsub -J 0-10 -v ICHTHYOP_CONF_LIST="$(ls config.xml_* | paste -s -d ' ')" run_ichthyop.pbs

Note than indexing in this last case starts at 0.

To check on the status of your jobs, use:

qstat -u $USER

To kill a running job, use the qdel command with the job ID.

Suggestions for avoiding common issues

  • For each job on the cluster of any importance whatsoever, always create a separate directory with a descriptive name and date and all necessary configuration files inside. This will help keep track of runs and in debugging.
  • I also generally maintain for each project a text file listing all job IDs and their properties.
  • For debugging, make sure to print lots of diagnostic messages and save all possible outputs. I often echo all shell and R commands so that a detailed record of all steps is created. set -x in PBS scripts can be very useful.
  • Interactive jobs can be very useful for debugging. You can start an interactive job using the following command: qsub -I -l walltime=01:00:00. Once started, you can walk line by line through a PBS script to identify issues.
  • Interactive jobs are helpful for errors at the start of the script, but for errors after the job has run for a long time, one will need to make sure to save the final state of the run before the error so that it can be examined by hand outside a PBS job. Exactly how best to do this will depend on the programs you are running, but in R, the commands tryCatch, on.exit and dump.frames may be helpful. Specifically, throwing something along the lines of options(error = quote({dump.frames(); save.image(file = "last.dump.rda")})) into the R script can be helpful (see here for more info).
  • Think carefully about the amount of disk space you are going to use and where you will store outputs to avoid issues like lost data.

Setting up R with packages using Conda

In my experience, the easiest way of assuring that R works on the cluster with all the packages you need installed and functioning is to use conda. There can still be issues with conda package compatibility, but this is still the best way to set up R on datarmor.

Using conda on the cluster will involve two steps:

  1. Setting up conda in your account
  2. Creating a specific R conda environment

Setting up your account

It is easiest to make conda available in every shell (e.g., SSH) session by adding them to the startup files. Execute the following commands in a shell session:

echo "source /appli/anaconda/latest/etc/profile.d/conda.csh" >> ~/.cshrc
echo "source /appli/anaconda/latest/etc/profile.d/conda.sh" >> ~/.bashrc

This command will add a line to the files ~/.cshrc (i.e. a file named .cshrc in your home directory) and ~/.bashrc that will configure your SSH sessions to be ready to use conda. Activate this by logging out of datarmor and logging back in.

You should also make standard conda environments already on the cluster available by executing in a shell session:

cat <<EOF > ~/.condarc
envs_dirs:
  - $DATAWORK/conda-env
  - /appli/conda-env
  - /appli/conda-env/2.7
  - /appli/conda-env/3.6
pkgs_dirs:
  - $DATAWORK/conda/pkgs
EOF

This will create the file ~/.condarc (or overwrite an existing one, so execute ls ~/.condarc to make sure one is not already present before executing this) which will be read each time you log into datarmor and any conda environments found in those directories will be available for activation.

Creating an R environment

Execute the following commands in a SSH session to datarmor:

conda create -c conda-forge -n my_r_env r-essentials r-base r-sf r-rnetcdf r-rgeos r-rgdal r-tidyverse

The name my_r_env is arbitrary and can be changed to whatever you prefer. Everything from r-sf onward is optional, but I often use them in my own work. You can add to the end of this list any additional packages that you need for your specific work. See Conda Forge for the full list of available packages.

Once created, you can make R available in a shell session (including within PBS scripts) using:

conda activate my_r_env

Uploading and downloading files

File transfers to and from the cluster can be done using at least three different methods:

  1. For (very) small transfers, one can use an SFTP session to the login node datarmor.ifremer.fr
  2. For larger transfers from inside an Ifremer building, one can copy files to and from datarmor using scp or sftp and connecting to the machine datacopy.ifremer.fr using your Ifremer intranet login and password. By default, files will be copied to and from your datarmor home directory.
  3. For larger transfers from outside an Ifremer building, copy the files you want to transfer into $SCRATCH/eftp on datarmor and then connect using FTP and your extranet credentials to eftp.ifremer.fr

For more instructions, see the documentation.

Installing Ichthyop on the cluster

The easiest way to install Ichthyop on the cluster is to install the version you intend to use on your own machine and then transfer the entire Ichthyop folder to datarmor using the transfer methods described above.

Once Ichthyop is on the cluster, you will need to activate java in each SSH session before using it:

module load java

This command can be added to the .cshrc and .bashrc files to assure that it is executed every time a shell (i.e., SSH) session is started, or, alternatively, it can be added to your PBS script.

To see the list of modules available and load a specific java version you could execute:

# List all available modules
module avail

# Load a specific Java version
module load java/1.8.0

# List loaded modules
module list