1. Connecting
  2. Managing your Account
  3. Storage
  4. Data Transfer
  5. Managing Your Environment (Modules)
  6. Programming Environment
  7. Running Jobs
  8. Job Dependencies
  9. Job Arrays
  10. Running MATLAB Batch Jobs
  11. Running Mathematica Batch Jobs
  12. R Software
  13. Python Software
  14. HPC & Other Tutorials
  15. Investor Specific Information

1. Connecting

The Campus Cluster can be accessed via Secure Shell (SSH) to the head nodes using your official University NetID login and password. Generally Unix/Linux based systems have a ssh client by default, however desktops/laptops running versions of Windows prior to Windows 10 version 1803 do not. There is third party software available for Windows users to access the Campus Cluster. Please see this non-exhaustive list of ssh clients that can be used to access the Campus Cluster.

Below is a list of hostnames that provide round-robin access to head nodes of the Campus Cluster instances as indicated:

Access Method Hostname Head Node
SSH cc-login.campuscluster.illinois.edu namehN
(ex. cc-login1,golubh1)

SSH

A variety of SSH based clients are available for accessing the Campus Cluster from your local system. There are two types of SSH clients, clients that support both remote login access and data transfers and clients that support data transfers only.

SSH Client Remote Login Data Transfer Installs On
MobaXterm is an enhanced terminal with an X server and a set of Unix commands (GNU/Cygwin) packaged in the application. Yes Yes Windows
SSH Secure Shell allows you to securely login to remote host computers, to execute commands safely on a remote computer, and to provide secure encrypted and authenticated communications between two hosts in an untrusted network. Yes Yes Windows
Tunnelier is a flexible SSH client which includes terminal emulation, graphical as well as command-line SFTP support, an FTP-to-SFTP bridge, additional tunneling features including dynamic port forwarding through integrated proxy. Yes Yes Windows
PuTTY is an open source terminal emulator application which can act as a client for the SSH, Telnet, rlogin, and raw TCP computing protocols and as a serial console client. Yes Yes Windows
Linux
Mac OS
FileZilla is a fast and reliable cross-platform FTP, FTPS and SFTP client with lots of useful features and an intuitive graphical user interface. No Yes Windows
Linux
Mac OS
WinSCP is an open source free SFTP client, SCP client, FTPS client and FTP client for Windows. Its main function is file transfer between a local and a remote computer. Beyond this, WinSCP offers scripting and basic file manager functionality. No Yes Windows

Note: See the Campus Cluster’s Storage and Data guide for information on transferring files/data to and from the Campus Cluster.


Network Details for Illinois Investors

The Campus Cluster is interconnected with the University of Illinois networks via the Campus Advanced Research Network Environment (CARNE) and is addressed out of fully-accessible public IP space, located outside of the Illinois campus firewall. This positioning of the Campus Cluster outside the campus firewall enables access to regional and national research networks at high speeds and without restrictions. This does mean, however, that for some special use cases where it is necessary for Campus Cluster nodes to initiate communication with hosts on the Illinois campus network (e.g., you are hosting a special license server behind the firewall), you will need to coordinate with your department IT pro to ensure that your hosts are in the appropriate Illinois campus firewall group. Outbound communication from Illinois to the Campus Cluster should work without issue, as well as any communications from the Campus Cluster outbound to regional and national research networks.

2. Managing your Account

When your account is first activated, the default shell is set to bash.

The tcsh shell is also available. To change your shell to tcsh, add the following line:

exec -l /bin/tcsh

to the end of the file named .bash_profile, located in your home ($HOME) directory. To begin using this new shell, you can either log out and then log back in, or execute exec -l /bin/tcsh on your command line.

The Campus Cluster uses the module system to set up the user environment. See the section Managing Your Environment (Modules) for details.

You can reset your NetID password at the Password Management page.

3. Storage

See new Storage and Data Guide at Link Below

Storage and Data Guide

4. Data Transfer

See new Storage and Data Guide at Link Below

Storage and Data Guide

5. Managing Your Environment (Modules)

The module command is a user interface to the Modules package. The Modules package provides for the dynamic modification of the user’s environment via modulefiles (a modulefile contains the information needed to configure the shell for an application). Modules are independent of the user’s shell, so both tcsh and bash users can use the same commands to change the environment.

Useful Module commands:

Command Description
module avail lists all available modules
module list lists currently loaded modules
module help modulefile help on module modulefile
module display modulefile Display information about modulefile
module load modulefile load modulefile into current shell environment
module unload modulefile remove modulefile from current shell environment
module swap modulefile1 modulefile2 unload modulefile1 and load modulefile2

To include particular software in the environment for all new shells, edit your shell configuration file ($HOME/.bashrc for bash users and $HOME/.cshrc for tcsh users) by adding the module commands to load the software that you want to be a part of your environment. After saving your changes, you can source your shell configuration file or log out and then log back in for the changes to take effect.

Note: Order is important. With each module load, the changes are prepended to your current environment paths.

For additional information on Modules, see the module and modulefile man pages or visit the Modules SourceForge page.

6. Programming Environment

The Intel compilers are available on the Campus Cluster.
module load intel/18.0
[Older versions of the Intel compiler are also available. See the output from the command module avail intel for the specific modules.]

The GNU compilers (GCC) version 4.4.7 are in the default user environment. Version 7.2.0 is also available — load this version with the command:
module load gcc/7.2.0

Compiler Commands

Serial

To build (compile and link) a serial program in Fortran, C, and C++ enter:

GCC Intel Compiler
gfortran myprog.f
gcc myprog.c
g++ myprog.cc
ifort myprog.f
icc myprog.c
icpc myprog.cc

MPI

To build (compile and link) a MPI program in Fortran, C, and C++:

MPI Implementation modulefile for MPI/Compiler Build Commands
MVAPICH2
(Home Page / User Guide)
mvapich2/2.3-intel-18.0
mvapich2/2.3-gcc-7.2.0
Fortran 77: mpif77 myprog.f
Fortran 90: mpif90 myprog.f90
C: mpicc myprog.c
C++: mpicxx myprog.cc

Open MPI
(Home Page / Documentation)
openmpi/3.1.1-intel-18.0
openmpi/3.1.1-gcc-7.2.0
Intel MPI
(Home Page / Documentation)
intel/18.0
                GCC         Intel Compiler
Fortran 77: mpif77 myprog.f mpiifort myprog.f
Fortran 90: mpif90 myprog.f90 mpiifort myprog.f90
C: mpicc myprog.c mpiicc myprog.c
C++: mpicxx myprog.cc mpiicpc myprog.cc

For example, use the following command to load MVAPICH2 v2.3 built with the Intel 18.0 compiler:

module load mvapich2/2.3-intel-18.0

OpenMP

To build an OpenMP program, use the -fopenmp / -qopenmp option:

GCC Intel Compiler
gfortran -fopenmp myprog.f
gfortran -fopenmp myprog.f90
gcc -fopenmp myprog.c
g++ -fopenmp myprog.cc
ifort -qopenmp myprog.f
ifort -qopenmp myprog.f90
icc -qopenmp myprog.c
icpc -qopenmp myprog.cc

Hybrid MPI/OpenMP

To build an MPI/OpenMP hybrid program, use the -fopenmp / -qopenmp option with the MPI compiling commands:

GCC Intel Compiler
MVAPICH2 OpenMPI Intel MPI MVAPICH2 OpenMPI Intel MPI
mpif77 -fopenmp myprog.f
mpif90 -fopenmp myprog.f90
mpicc -fopenmp myprog.c
mpicxx -fopenmp myprog.cc
mpif77 -fopenmp myprog.f
mpif90 -fopenmp myprog.f90
mpicc -fopenmp myprog.c
mpicxx -fopenmp myprog.cc
mpiifort -qopenmp myprog.f
mpiifort -qopenmp myprog.f90
mpiicc -qopenmp myprog.c
mpiicpc -qopenmp myprog.cc

CUDA

NVIDIA GPUs are available as a purchase option of the Campus Cluster. CUDA is a parallel computing platform and programming model from NVIDIA for use on their GPUs. These GPUs support CUDA compute capability 2.0.

Load the CUDA Toolkit into your environment using the following module command:

module load cuda

Libraries

The Intel Math Kernel Library (MKL) contains the complete set of functions from the basic linear algebra subprograms (BLAS), the extended BLAS (sparse), and the complete set of LAPACK routines. In addition, there is a set of fast Fourier transforms (FFT) in single- and double-precision, real and complex data types with both Fortran and C interfaces. The library also includes the cblas interfaces, which allow the C programmer to access all the functionality of the BLAS without considering C-Fortran issues. ScaLAPACK, BLACS and the PARDISO solver are also provided by Intel MKL. MKL provides FFTW interfaces to enable applications using FFTW to gain performance with Intel MKL and without changing the program source code. Both FFTW2 and FFTW3 interfaces are provided as source code wrappers to Intel MKL functions.

Load the Intel compiler module to access MKL.

Use the following -mkl flag options when linking with MKL using the Intel compilers:

Sequential libraries: -mkl=sequential
Threads libraries: -mkl=parallel

To use MKL with GCC, consult the Intel MKL link advisor for the link flags to include.

OpenBLAS, an optimized BLAS library based on GotoBLAS2 is also available. Load the library (version 0.3.12, built with gcc 7.2.0) module with the following command:

module load openblas/0.3.12_sandybridge

Link with the OpenBLAS library using

-L /usr/local/src/openblas/0.3.12/gcc/Sandy.Bridge/lib -lopenblas

7. Running Jobs

User access to the compute nodes for running jobs is available via a batch job. The Campus Cluster uses the Slurm Workload Manager for running batch jobs. See the sbatch section Batch Commands for details on batch job submission.

Please be aware that the interactive (login/head) nodes are a shared resource for all users of the system and their use should be limited to editing, compiling and building your programs, and for short non-intensive runs.

Note: User processes running on the interactive (login/head) nodes are killed automatically if they accrue more than 30 minutes of CPU time or if more than 4 identical processes owned by the same user are running concurrently.

An interactive batch job provides a way to get interactive access to a compute node via a batch job. See the srun or salloc section for information on how to run an interactive job on the compute nodes. Also, a very short time test queue provides quick turnaround time for debugging purposes.

To ensure the health of the batch system and scheduler users should refrain from having more than 1,000 batch jobs in the queues at any one time.

See the document “Running Serial Jobs Efficiently on the Campus Cluster” regarding information on expediting job turnaround time for serial jobs.

See the Running MATLAB / Mathematica Batch Jobs sections for information on running MATLAB and Mathematica on the campus cluster.

Running Programs

On successful building (compilation and linking) of your program, an executable is created that is used to run the program. The table below describes how to run different types of programs.

Program Type How to run the program/executable Example Command
Serial To run serial code, specify the name of the executable. ./a.out
MPI MPI programs are run with the srun command followed by the name of the executable. 

Note: The total number of MPI processes is the {number of nodes} x {cores/node} set in the batch job resource specification.

srun ./a.out
OpenMP The OMP_NUM_THREADS environment variable can be set to specify the number of threads used by OpenMP programs. If this variable is not set, the number of threads used defaults to one under the Intel compiler. Under GCC, the default behavior is to use one thread for each core available on the node. 

To run OpenMP programs, specify the name of the executable.

In bash: export OMP_NUM_THREADS=16 

In tcsh: setenv OMP_NUM_THREADS 16

./a.out

MPI/OpenMP As with OpenMP programs, the OMP_NUM_THREADS environment variable can be set to specify the number of threads used by the OpenMP portion of the mixed MPI/OpenMP program. The same default behavior applies with respect to the number of threads used. 

Use the srun command followed by the name of the executable to run mixed MPI/OpenMP programs.

Note: The number of MPI processes per node is set in the batch job resource specification for number of cores/node.

In bash: export OMP_NUM_THREADS=4 

In tcsh: setenv OMP_NUM_THREADS 4

srun ./a.out

Primary Queues

Each investor group has unrestricted access to a dedicated primary queue with concurrent access to the number and type of nodes in which they invested.

Users can view the partitions(queues) that they have the ability to submit batch jobs to, by typing the following command: [cc-login1 ~]$ sinfo -s -o "%.14R %.12l %.12L %.5D"

Users can also view specific configuration information about the compute nodes associated with their primary partition(s), by typing the following command: [cc-login1 ~]$ sinfo -p queue(partition)_name -N -o "%.8N %.4c %.16P %.9m %.12l %.12L %G"

Secondary Queues

One of the advantages of the Campus Cluster Program is the ability to share resources. A shared secondary queue will allow users access to any idle nodes in the cluster. Users must have access to a primary queue to be eligible to use the secondary queue.

While each investor has full access to the number and type of nodes in which they invested, those resources not fully utilized by each investor will become eligible to run secondary queue jobs. If there are resources eligible to run secondary queue jobs but there are no jobs to be run from the secondary queue, jobs in the primary queues that fit within the constraints of the secondary queue may be run on any otherwise appropriate idle nodes. The secondary queue uses fairshare scheduling.

The current limits in the secondary queues are below:

Queue Max Walltime Max # Nodes
secondary 4 hours 305
secondary-Eth 4 hours 21

Notes:

    • Jobs are routed to the secondary queue when a queue is not specified. i.e., the secondary queue is the default queue on the Campus Cluster.

    • The difference between secondary and secondary-Eth queues is the compute nodes associated with the secondary queue are interconnected
      via Infinniband (IB) and the compute nodes that are associated with the “secondary-Eth” queue are interconnected via Ethernet. Currently
      Ethernet is slower than Infinniband, but this only matters in terms of performance if users have batch jobs that use multiple nodes and need
      to communicate between nodes (like with MPI codes) or for jobs with heavy file system I/O requirements.

Test Queue

A test queue is available for providing very short jobs with quick turnaround time.

The current limits in the test queue are:

Queue Max Walltime Max # Nodes
test 4 hours 2

Batch Commands

Below are brief descriptions of the primary batch commands. For more detailed information, refer to the individual man pages.

sbatch

Note: On Wednesday, September 23, 2020, the Campus Cluster has completely transitioned from the MOAB/Torque (PBS) batch system to the SLURM batch system.

Batch jobs are submitted through a job script using the sbatch command. Job scripts generally start with a series of SLURM directives that describe requirements of the job such as number of nodes, wall time required, etc… to the batch system/scheduler (SLURM directives can also be specified as options on the sbatch command line; command line options take precedence over those in the script). The rest of the batch script consists of user commands.

Sample batch scripts are available in the directory /projects/consult/slurm.

The syntax for sbatch is:

sbatch [list of sbatch options] script_name

The main sbatch options are listed below. Also See the sbatch man page for options.

  • The common resource_names are:
    ‑‑time=time

    time=maximum wall clock time (d-hh:mm:ss) [default: maximum limit of the queue(partition) submitted to]

    ‑‑nodes=n

    ‑‑ntasks=p Total number of cores for the batch job

    ‑‑ntasks-per-node=p Number of cores per node (same as ppn under PBS)

    n=number of 16/20/24/28/40-core nodes [default: 1 node]
    p=how many cores(ntasks) per job or per node(ntasks-per-node) to use (1 through 40) [default: 1 core]

    Examples:
    ‑‑time=00:30:00
    ‑‑nodes=2
    ‑‑ntasks=32

    or

    ‑‑time=00:30:00
    ‑‑nodes=2
    ‑‑ntasks-per-node=16

    Memory needs: For investors that have nodes with varying amounts of memory or to run in the secondary queue, nodes with a specific amount of memory can be targeted. The compute nodes have memory configurations of 64GB, 128GB, 192GB, 256GB or 384GB. Not all memory configurations are available in all investor queues. Please check with the technical representative of your investor group to determine what memory configurations are available for the nodes in your primary queue.

    Example:
    ‑‑time=00:30:00
    ‑‑nodes=2
    ‑‑ntask=32
    ‑‑mem=118000

    or

    ‑‑time=00:30:00
    ‑‑nodes=2
    ‑‑ntasks-per-node=16
    ‑‑mem-per-cpu=7375

    Note: Do not use the memory specification unless absolutely required since it could delay scheduling of the
    job; also, if nodes with the specified memory are unavailable for the specified queue the job will never run.

    Specifying nodes with GPUs: To run jobs on nodes with GPUs, add the resource specification TeslaM2090 (for Tesla M2090), TeslaK40M (for Tesla K40M), K80 (for Tesla K80), P100 (for Tesla P100) or V100 (for Tesla V100) if your primary queue has nodes with multiple types of GPUs, nodes with and without GPUs or if you are submitting jobs to the secondary queue. Through the secondary queue any user can access the nodes that are configured with any of the specific GPUs. Please check with the technical representative of your investor group to determine if GPUs are available on the nodes in your primary queue.

    Example:
    ‑‑gres=gpu:V100

    Note: For investors with all GPU nodes wishing to run in their primary queue, only the queue name specification (via the sbatch -p option below) is required.

Useful Batch Job Environment Variables
Description SLURM Environment Variable Detail Description PBS Environment Variable
(no longer valid)
JobID $SLURM_JOB_ID Job identifier assigned to the job $PBS_JOBID
Job Submission Directory $SLURM_SUBMIT_DIR By default, jobs start in the directory the job was submitted from. So the   cd $SLURM_SUBMIT_DIR command is not needed. $PBS_O_WORKDIR
Machine(node) list $SLURM_NODELIST variable name that containins the list of nodes assigned to the batch job $PBS_NODEFILE
Array JobID $SLURM_ARRAY_JOB_ID
$SLURM_ARRAY_TASK_ID
each member of a job array is assigned a unique identifier (see the Job Arrays section) $PBS_ARRAYID

See the sbatch man page for additional environment variables available.

srun

The srun command initiates an interactive job on the compute nodes.

For example, the following command:

[golubh1 ~]$ srun --partition=ncsa --time=00:30:00 --nodes=1 --ntasks-per-node=16 --pty /bin/bash

will run an interactive job in the ncsa queue with a wall clock limit of 30 minutes, using one node and 16 cores per node. You can also use other sbatch options such as those documented above.

After you enter the command, you will have to wait for SLURM to start the job. As with any job, your interactive job will wait in the queue until the specified number of nodes is available. If you specify a small number of nodes for smaller amounts of time, the wait should be shorter because your job will backfill among larger jobs. You will see something like this:

srun: job 123456 queued and waiting for resources

Once the job starts, you will see:

srun: job 123456 has been allocated resources

and will be presented with an interactive shell prompt on the launch node. At this point, you can use the appropriate command to start your program.

When you are done with your runs, you can use the exit command to end the job.

squeue/qstat

Commands that display the status of batch jobs.

    SLURM Example Command Command Description Torque/PBS
    Example Command
    squeue -a List the status of all jobs on the system. qstat -a
    squeue -u $USER List the status of all your jobs in the batch system. qstat -u $USER
    squeue -j JobID List nodes allocated to a running job in addition to basic information.. qstat -n JobID
    scontrol show job JobID List detailed information on a particular job. qstat -f JobID
    sinfo -a List summary information on all the queues. qstat -q

See the man page for other options available.

scancel/qdel

The scancel or qdelcommand deletes a queued job or kills a running job.

  • scancel JobID deletes/kills a job.

8. Job Dependencies

SLURM job dependencies allow users to set execution order in which their queued jobs run. Job dependencies are set by using the ‑‑dependency option with the syntax being ‑‑dependency=<dependency type>:<JobID>. SLURM places the jobs in Hold state until they are eligible to run.

The following are examples on how to specify job dependencies using the afterany dependency type, which indicates to SLURM that the dependent job should become eligible to start only after the specified job has completed.

On the command line:

[golubh1 ~]$ sbatch --dependency=afterany:<JobID> jobscript.sbatch

In a job script:

#!/bin/bash
#SBATCH --time=00:30:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=16
#SBATCH --job-name="myjob"
#SBATCH --partition=secondary
#SBATCH --output=myjob.o%j
#SBATCH --dependency=afterany:<JobID>

In a shell script that submits batch jobs:

#!/bin/bash
JOB_01=`sbatch jobscript1.sbatch |cut -f 4 -d " "`
JOB_02=`sbatch --dependency=afterany:$JOB_01 jobscript2.sbatch |cut -f 4 -d " "`
JOB_03=`sbatch --dependency=afterany:$JOB_02 jobscript3.sbatch |cut -f 4 -d " "`
...

Note: Generally the recommended dependency types to use are after, afterany, afternotok and afterok. While there are additional dependency types, those types that work based on batch job error codes may not behave as expected because of the difference between a batch job error and application errors. See the dependency section of the sbatch manual page for additional information (man sbatch).

9. Job Arrays

If a need arises to submit the same job to the batch system multiple times, instead of issuing one sbatch command for each individual job, users can submit a job array. Job arrays allow users to submit multiple jobs with a single job script using the ‑‑array option to sbatch. An optional slot limit can be specified to limit the amount of jobs that can run concurrently in the job array. See the sbatch manual page for details (man sbatch). The file names for the input, output, etc. can be varied for each job using the job array index value defined by the SLURM environment variable SLURM_ARRAY_TASK_ID.

A sample batch script that makes use of job arrays is available in /projects/consult/slurm/jobarray.sbatch.

Notes:

  • Valid specifications for job arrays are
    ‑‑array 1-10
    ‑‑array 1,2,6-10
    ‑‑array 8
    ‑‑array 1-100%5
    (a limit of 5 jobs can run concurrently) 

     
  • You should limit the number of batch jobs in the queues at any one time to 1,000 or less. (Each job within a job array is counted as one batch job.)
  • Interactive batch jobs are not supported with job array submissions.
  • To delete job arrays, see the qdel command section.

10. Running Matlab Batch Jobs

See the Using MATLAB on the Campus Cluster page for information on running MATLAB batch jobs.

11. Running Mathematica Batch Jobs

Standard batch job

A sample batch script that runs a Mathematica script is available in /projects/consult/slurm/mathematica.sbatch. You can copy and modify this script for your own use. Submit the job with:

[golubh1 ~]$ sbatch mathematica.sbatch

In an interactive batch job

  • For the GUI (which will display on your local machine), use the –x11 option with the srun command:
    srun --x11 --export=All --time=00:30:00 --nodes=1 --ntasks-per-node=16 --partition=secondary --pty /bin/bash

    Once the batch job starts, you will have an interactive shell prompt on a compute node. Then type:

    module load mathematica
    mathematica
    

    Note: An X-Server must be running on your local machine with X11 forwarding enabled within your ssh connection in order to display X-Apps, GUIs, etc … back on your local machine. Generally users on Linux based machines only have to enable X11 forwarding by using the -X option with the ssh command. While users on Windows machines will need to ensure that their ssh client has X11 forwarding enabled and an X-Server is running. A list of ssh clients (which includes a combo packaged ssh client and X-Server) can be found in the ssh section. Additional information about running X applications can be found on the Using the X Window System page.

  •  

    For the command line interface:

    srun --export=All --time=00:30:00 --nodes=1 --ntasks-per-node=16 --partition=secondary --pty /bin/bash

    Once the batch job starts, you will have an interactive shell prompt on a compute node. Then type:

    module load mathematica
    math
    

12. R Software

See the R on the Campus Cluster page for versions available and information on installing add-on packages.

13. Python Software

See the Python on the Campus Cluster page for versions available and information on installing add-on packages.

14. HPC & Other Tutorials

The NSF-funded XSEDE program offers online training on various HPC topics—see XSEDE Online Training for links to the available courses.

Introduction to Linux offered by the LINUX Foundation.

15. Investor Specific Information

See here for the technical representative of each investor group and links to investor web sites (if available).