- Connecting
- Managing your Account
- Storage
- Data Transfer
- Managing Your Environment (Modules)
- Programming Environment
- Running Jobs
- Job Dependencies
- Job Arrays
- Running MATLAB Batch Jobs
- Running Mathematica Batch Jobs
- R Software
- Python Software
- HPC & Other Tutorials
- Investor Specific Information
1. Connecting
The Campus Cluster can be accessed via Secure Shell (SSH) to the head nodes using your official University NetID login and password. Generally Unix/Linux based systems have a ssh client by default, however desktops/laptops running versions of Windows prior to Windows 10 version 1803 do not. There is third party software available for Windows users to access the Campus Cluster. Please see this non-exhaustive list of ssh clients that can be used to access the Campus Cluster.
Below is a list of hostnames that provide round-robin access to head nodes of the Campus Cluster instances as indicated:
Access Method | Hostname | Head Node |
---|---|---|
SSH | cc-login.campuscluster.illinois.edu | namehN (ex. cc-login1,golubh1) |
SSH
A variety of SSH based clients are available for accessing the Campus Cluster from your local system. There are two types of SSH clients, clients that support both remote login access and data transfers and clients that support data transfers only.
SSH Client | Remote Login | Data Transfer | Installs On |
---|---|---|---|
MobaXterm is an enhanced terminal with an X server and a set of Unix commands (GNU/Cygwin) packaged in the application. | Yes | Yes | Windows |
SSH Secure Shell allows you to securely login to remote host computers, to execute commands safely on a remote computer, and to provide secure encrypted and authenticated communications between two hosts in an untrusted network. | Yes | Yes | Windows |
Tunnelier is a flexible SSH client which includes terminal emulation, graphical as well as command-line SFTP support, an FTP-to-SFTP bridge, additional tunneling features including dynamic port forwarding through integrated proxy. | Yes | Yes | Windows |
PuTTY is an open source terminal emulator application which can act as a client for the SSH, Telnet, rlogin, and raw TCP computing protocols and as a serial console client. | Yes | Yes | Windows Linux Mac OS |
FileZilla is a fast and reliable cross-platform FTP, FTPS and SFTP client with lots of useful features and an intuitive graphical user interface. | No | Yes | Windows Linux Mac OS |
WinSCP is an open source free SFTP client, SCP client, FTPS client and FTP client for Windows. Its main function is file transfer between a local and a remote computer. Beyond this, WinSCP offers scripting and basic file manager functionality. | No | Yes | Windows |
Note: See the Campus Cluster’s Storage and Data guide for information on transferring files/data to and from the Campus Cluster.
Network Details for Illinois Investors
The Campus Cluster is interconnected with the University of Illinois networks via the Campus Advanced Research Network Environment (CARNE) and is addressed out of fully-accessible public IP space, located outside of the Illinois campus firewall. This positioning of the Campus Cluster outside the campus firewall enables access to regional and national research networks at high speeds and without restrictions. This does mean, however, that for some special use cases where it is necessary for Campus Cluster nodes to initiate communication with hosts on the Illinois campus network (e.g., you are hosting a special license server behind the firewall), you will need to coordinate with your department IT pro to ensure that your hosts are in the appropriate Illinois campus firewall group. Outbound communication from Illinois to the Campus Cluster should work without issue, as well as any communications from the Campus Cluster outbound to regional and national research networks.
2. Managing your Account
When your account is first activated, the default shell is set to bash.
The tcsh shell is also available. To change your shell to tcsh, add the following line:
exec -l /bin/tcsh
to the end of the file named .bash_profile, located in your home ($HOME) directory. To begin using this new shell, you can either log out and then log back in, or execute exec -l /bin/tcsh on your command line.
The Campus Cluster uses the module system to set up the user environment. See the section Managing Your Environment (Modules) for details.
You can reset your NetID password at the Password Management page.
3. Storage
See new Storage and Data Guide at Link Below
Storage and Data Guide4. Data Transfer
See new Storage and Data Guide at Link Below
Storage and Data Guide5. Managing Your Environment (Modules)
The module command is a user interface to the Modules package. The Modules package provides for the dynamic modification of the user’s environment via modulefiles (a modulefile contains the information needed to configure the shell for an application). Modules are independent of the user’s shell, so both tcsh and bash users can use the same commands to change the environment.
Useful Module commands:
Command | Description |
---|---|
module avail | lists all available modules |
module list | lists currently loaded modules |
module help modulefile | help on module modulefile |
module display modulefile | Display information about modulefile |
module load modulefile | load modulefile into current shell environment |
module unload modulefile | remove modulefile from current shell environment |
module swap modulefile1 modulefile2 | unload modulefile1 and load modulefile2 |
To include particular software in the environment for all new shells, edit your shell configuration file ($HOME/.bashrc for bash users and $HOME/.cshrc for tcsh users) by adding the module commands to load the software that you want to be a part of your environment. After saving your changes, you can source your shell configuration file or log out and then log back in for the changes to take effect.
Note: Order is important. With each module load, the changes are prepended to your current environment paths.
For additional information on Modules, see the module and modulefile man pages or visit the Modules SourceForge page.
6. Programming Environment
The Intel compilers are available on the Campus Cluster.
module load intel/18.0
[Older versions of the Intel compiler are also available. See the output from the command module avail intel for the specific modules.]
The GNU compilers (GCC) version 4.4.7 are in the default user environment. Version 7.2.0 is also available — load this version with the command:
module load gcc/7.2.0
Compiler Commands
Serial
To build (compile and link) a serial program in Fortran, C, and C++ enter:
GCC | Intel Compiler |
---|---|
gfortran myprog.f gcc myprog.c g++ myprog.cc |
ifort myprog.f icc myprog.c icpc myprog.cc |
MPI
To build (compile and link) a MPI program in Fortran, C, and C++:
MPI Implementation | modulefile for MPI/Compiler | Build Commands | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
MVAPICH2 (Home Page / User Guide) |
mvapich2/2.3-intel-18.0 mvapich2/2.3-gcc-7.2.0 |
|
|||||||||||||||
Open MPI (Home Page / Documentation) |
openmpi/3.1.1-intel-18.0 openmpi/3.1.1-gcc-7.2.0 |
||||||||||||||||
Intel MPI (Home Page / Documentation) |
intel/18.0 |
|
For example, use the following command to load MVAPICH2 v2.3 built with the Intel 18.0 compiler:
module load mvapich2/2.3-intel-18.0
OpenMP
To build an OpenMP program, use the -fopenmp / -qopenmp option:
GCC | Intel Compiler |
---|---|
gfortran -fopenmp myprog.f gfortran -fopenmp myprog.f90 gcc -fopenmp myprog.c g++ -fopenmp myprog.cc |
ifort -qopenmp myprog.f ifort -qopenmp myprog.f90 icc -qopenmp myprog.c icpc -qopenmp myprog.cc |
Hybrid MPI/OpenMP
To build an MPI/OpenMP hybrid program, use the -fopenmp / -qopenmp option with the MPI compiling commands:
GCC | Intel Compiler | ||||
---|---|---|---|---|---|
MVAPICH2 | OpenMPI | Intel MPI | MVAPICH2 | OpenMPI | Intel MPI |
mpif77 -fopenmp myprog.f mpif90 -fopenmp myprog.f90 mpicc -fopenmp myprog.c mpicxx -fopenmp myprog.cc |
mpif77 -fopenmp myprog.f mpif90 -fopenmp myprog.f90 mpicc -fopenmp myprog.c mpicxx -fopenmp myprog.cc |
mpiifort -qopenmp myprog.f mpiifort -qopenmp myprog.f90 mpiicc -qopenmp myprog.c mpiicpc -qopenmp myprog.cc |
CUDA
NVIDIA GPUs are available as a purchase option of the Campus Cluster. CUDA is a parallel computing platform and programming model from NVIDIA for use on their GPUs. These GPUs support CUDA compute capability 2.0.
Load the CUDA Toolkit into your environment using the following module command:
module load cuda
Libraries
The Intel Math Kernel Library (MKL) contains the complete set of functions from the basic linear algebra subprograms (BLAS), the extended BLAS (sparse), and the complete set of LAPACK routines. In addition, there is a set of fast Fourier transforms (FFT) in single- and double-precision, real and complex data types with both Fortran and C interfaces. The library also includes the cblas interfaces, which allow the C programmer to access all the functionality of the BLAS without considering C-Fortran issues. ScaLAPACK, BLACS and the PARDISO solver are also provided by Intel MKL. MKL provides FFTW interfaces to enable applications using FFTW to gain performance with Intel MKL and without changing the program source code. Both FFTW2 and FFTW3 interfaces are provided as source code wrappers to Intel MKL functions.
Load the Intel compiler module to access MKL.
Use the following -mkl flag options when linking with MKL using the Intel compilers:
Sequential libraries: -mkl=sequential
Threads libraries: -mkl=parallel
To use MKL with GCC, consult the Intel MKL link advisor for the link flags to include.
OpenBLAS, an optimized BLAS library based on GotoBLAS2 is also available. Load the library (version 0.3.12, built with gcc 7.2.0) module with the following command:
module load openblas/0.3.12_sandybridge
Link with the OpenBLAS library using
-L /usr/local/src/openblas/0.3.12/gcc/Sandy.Bridge/lib -lopenblas
7. Running Jobs
User access to the compute nodes for running jobs is available via a batch job. The Campus Cluster uses the Slurm Workload Manager for running batch jobs. See the sbatch section Batch Commands for details on batch job submission.
Please be aware that the interactive (login/head) nodes are a shared resource for all users of the system and their use should be limited to editing, compiling and building your programs, and for short non-intensive runs.
Note: User processes running on the interactive (login/head) nodes are killed automatically if they accrue more than 30 minutes of CPU time or if more than 4 identical processes owned by the same user are running concurrently.
An interactive batch job provides a way to get interactive access to a compute node via a batch job. See the srun or salloc section for information on how to run an interactive job on the compute nodes. Also, a very short time test queue provides quick turnaround time for debugging purposes.
To ensure the health of the batch system and scheduler users should refrain from having more than 1,000 batch jobs in the queues at any one time.
See the document “Running Serial Jobs Efficiently on the Campus Cluster” regarding information on expediting job turnaround time for serial jobs.
See the Running MATLAB / Mathematica Batch Jobs sections for information on running MATLAB and Mathematica on the campus cluster.
Running Programs
On successful building (compilation and linking) of your program, an executable is created that is used to run the program. The table below describes how to run different types of programs.
Program Type | How to run the program/executable | Example Command |
---|---|---|
Serial | To run serial code, specify the name of the executable. | ./a.out |
MPI | MPI programs are run with the srun command followed by the name of the executable.
Note: The total number of MPI processes is the {number of nodes} x {cores/node} set in the batch job resource specification. |
srun ./a.out |
OpenMP | The OMP_NUM_THREADS environment variable can be set to specify the number of threads used by OpenMP programs. If this variable is not set, the number of threads used defaults to one under the Intel compiler. Under GCC, the default behavior is to use one thread for each core available on the node.
To run OpenMP programs, specify the name of the executable. |
In bash: export OMP_NUM_THREADS=16
In tcsh: setenv OMP_NUM_THREADS 16 ./a.out |
MPI/OpenMP | As with OpenMP programs, the OMP_NUM_THREADS environment variable can be set to specify the number of threads used by the OpenMP portion of the mixed MPI/OpenMP program. The same default behavior applies with respect to the number of threads used.
Use the srun command followed by the name of the executable to run mixed MPI/OpenMP programs. Note: The number of MPI processes per node is set in the batch job resource specification for number of cores/node. |
In bash: export OMP_NUM_THREADS=4
In tcsh: setenv OMP_NUM_THREADS 4 srun ./a.out |
Primary Queues
Each investor group has unrestricted access to a dedicated primary queue with concurrent access to the number and type of nodes in which they invested.
Users can view the partitions(queues) that they have the ability to submit batch jobs to, by typing the following command: [cc-login1 ~]$ sinfo -s -o "%.16R %.12l %.12L %.5D"
Users can also view specific configuration information about the compute nodes associated with their primary partition(s), by typing the following command: [cc-login1 ~]$ sinfo -p queue(partition)_name -N -o "%.8N %.4c %.16P %.9m %.12l %.12L %G"
Secondary Queues
One of the advantages of the Campus Cluster Program is the ability to share resources. A shared secondary queue will allow users access to any idle nodes in the cluster. Users must have access to a primary queue to be eligible to use the secondary queue.
While each investor has full access to the number and type of nodes in which they invested, those resources not fully utilized by each investor will become eligible to run secondary queue jobs. If there are resources eligible to run secondary queue jobs but there are no jobs to be run from the secondary queue, jobs in the primary queues that fit within the constraints of the secondary queue may be run on any otherwise appropriate idle nodes. The secondary queue uses fairshare scheduling.
The current limits in the secondary queues are below:
Queue | Max Walltime | Max # Nodes |
---|---|---|
secondary | 4 hours | 305 |
secondary-Eth | 4 hours | 21 |
Notes:
-
- Jobs are routed to the secondary queue when a queue is not specified. i.e., the secondary queue is the default queue on the Campus Cluster.
- The difference between secondary and secondary-Eth queues is the compute nodes associated with the secondary queue are interconnected
via Infinniband (IB) and the compute nodes that are associated with the “secondary-Eth” queue are interconnected via Ethernet. Currently
Ethernet is slower than Infinniband, but this only matters in terms of performance if users have batch jobs that use multiple nodes and need
to communicate between nodes (like with MPI codes) or for jobs with heavy file system I/O requirements.
- Jobs are routed to the secondary queue when a queue is not specified. i.e., the secondary queue is the default queue on the Campus Cluster.
Test Queue
A test queue is available for providing very short jobs with quick turnaround time.
The current limits in the test queue are:
Queue | Max Walltime | Max # Nodes |
---|---|---|
test | 4 hours | 2 |
Batch Commands
Below are brief descriptions of the primary batch commands. For more detailed information, refer to the individual man pages.
sbatch
Note: On Wednesday, September 23, 2020, the Campus Cluster has completely transitioned from the MOAB/Torque (PBS) batch system to the SLURM batch system.
Batch jobs are submitted through a job script using the sbatch command. Job scripts generally start with a series of SLURM directives that describe requirements of the job such as number of nodes, wall time required, etc… to the batch system/scheduler (SLURM directives can also be specified as options on the sbatch command line; command line options take precedence over those in the script). The rest of the batch script consists of user commands.
Sample batch scripts are available in the directory /projects/consult/slurm.
The syntax for sbatch is:
sbatch [list of sbatch options] script_name
The main sbatch options are listed below. Also See the sbatch man page for options.
-
The common resource_names are:
‑‑time=timetime=maximum wall clock time (d-hh:mm:ss) [default: maximum limit of the queue(partition) submitted to]
‑‑nodes=n
‑‑ntasks=p Total number of cores for the batch job
‑‑ntasks-per-node=p Number of cores per node (same as ppn under PBS)
n=number of 16/20/24/28/40/128-core nodes [default: 1 node]
p=how many cores(ntasks) per job or per node(ntasks-per-node) to use (1 through 40) [default: 1 core]
Examples:
‑‑time=00:30:00
‑‑nodes=2
‑‑ntasks=32
or
‑‑time=00:30:00
‑‑nodes=2
‑‑ntasks-per-node=16
Memory needs: For investors that have nodes with varying amounts of memory or to run in the secondary queue, nodes with a specific amount of memory can be targeted. The compute nodes have memory configurations of 64GB, 128GB, 192GB, 256GB or 384GB. Not all memory configurations are available in all investor queues. Please check with the technical representative of your investor group to determine what memory configurations are available for the nodes in your primary queue.
Example:
‑‑time=00:30:00
‑‑nodes=2
‑‑ntask=32
‑‑mem=118000
or
‑‑time=00:30:00
‑‑nodes=2
‑‑ntasks-per-node=16
‑‑mem-per-cpu=7375
Note: Do not use the memory specification unless absolutely required since it could delay scheduling of the
job; also, if nodes with the specified memory are unavailable for the specified queue the job will never run.Specifying nodes with GPUs: To run jobs on nodes with GPUs, add the resource specification TeslaM2090 (for Tesla M2090), TeslaK40M (for Tesla K40M), K80 (for Tesla K80), P100 (for Tesla P100), V100 (for Tesla V100), TeslaT4 (for Tesla T4) or A40 (for Tesla A40) if your primary queue has nodes with multiple types of GPUs, nodes with and without GPUs or if you are submitting jobs to the secondary queue. Through the secondary queue any user can access the nodes that are configured with any of the specific GPUs.
Example:
‑‑gres=gpu:V100
or
‑‑gres=gpu:V100:2
to specify two V100 GPUs (default is 1 if no number is specified after the gpu type). Note: Requesting more GPUs than what is available on a single compute node will result in a failed batch job submission.
To determine if GPUs are available on any of the compute nodes in your group’s partition(queue), run the command:
or check with the technical representative of your investor group.sinfo -p queue(partition)_name -N -o "%.8N %.4c %.16G %.16P %50f"
Useful Batch Job Environment Variables
Description | SLURM Environment Variable | Detail Description | PBS Environment Variable (no longer valid) |
---|---|---|---|
JobID | $SLURM_JOB_ID | Job identifier assigned to the job | $PBS_JOBID |
Job Submission Directory | $SLURM_SUBMIT_DIR | By default, jobs start in the directory the job was submitted from. So the cd $SLURM_SUBMIT_DIR command is not needed. | $PBS_O_WORKDIR |
Machine(node) list | $SLURM_NODELIST | variable name that containins the list of nodes assigned to the batch job | $PBS_NODEFILE |
Array JobID | $SLURM_ARRAY_JOB_ID $SLURM_ARRAY_TASK_ID |
each member of a job array is assigned a unique identifier (see the Job Arrays section) | $PBS_ARRAYID |
See the sbatch man page for additional environment variables available.
srun
The srun command initiates an interactive job on the compute nodes.
For example, the following command:
[golubh1 ~]$ srun --partition=ncsa --time=00:30:00 --nodes=1 --ntasks-per-node=16 --pty /bin/bash
will run an interactive job in the ncsa queue with a wall clock limit of 30 minutes, using one node and 16 cores per node. You can also use other sbatch options such as those documented above.
After you enter the command, you will have to wait for SLURM to start the job. As with any job, your interactive job will wait in the queue until the specified number of nodes is available. If you specify a small number of nodes for smaller amounts of time, the wait should be shorter because your job will backfill among larger jobs. You will see something like this:
srun: job 123456 queued and waiting for resources
Once the job starts, you will see:
srun: job 123456 has been allocated resources
and will be presented with an interactive shell prompt on the launch node. At this point, you can use the appropriate command to start your program.
When you are done with your runs, you can use the exit command to end the job.
squeue/qstat
Commands that display the status of batch jobs.
SLURM Example Command | Command Description | Torque/PBS Example Command |
---|---|---|
squeue -a | List the status of all jobs on the system. | qstat -a |
squeue -u $USER | List the status of all your jobs in the batch system. | qstat -u $USER |
squeue -j JobID | List nodes allocated to a running job in addition to basic information.. | qstat -n JobID |
scontrol show job JobID | List detailed information on a particular job. | qstat -f JobID |
sinfo -a | List summary information on all the queues. | qstat -q |
See the man page for other options available.
scancel/qdel
The scancel or qdelcommand deletes a queued job or kills a running job.
- scancel JobID deletes/kills a job.
8. Job Dependencies
SLURM job dependencies allow users to set execution order in which their queued jobs run. Job dependencies are set by using the ‑‑dependency option with the syntax being ‑‑dependency=<dependency type>:<JobID>. SLURM places the jobs in Hold state until they are eligible to run.
The following are examples on how to specify job dependencies using the afterany dependency type, which indicates to SLURM that the dependent job should become eligible to start only after the specified job has completed.
On the command line:
[golubh1 ~]$ sbatch --dependency=afterany:<JobID> jobscript.sbatch
In a job script:
#!/bin/bash #SBATCH --time=00:30:00 #SBATCH --nodes=1 #SBATCH --ntasks-per-node=16 #SBATCH --job-name="myjob" #SBATCH --partition=secondary #SBATCH --output=myjob.o%j #SBATCH --dependency=afterany:<JobID>
In a shell script that submits batch jobs:
#!/bin/bash JOB_01=`sbatch jobscript1.sbatch |cut -f 4 -d " "` JOB_02=`sbatch --dependency=afterany:$JOB_01 jobscript2.sbatch |cut -f 4 -d " "` JOB_03=`sbatch --dependency=afterany:$JOB_02 jobscript3.sbatch |cut -f 4 -d " "` ...
Note: Generally the recommended dependency types to use are after, afterany, afternotok and afterok. While there are additional dependency types, those types that work based on batch job error codes may not behave as expected because of the difference between a batch job error and application errors. See the dependency section of the sbatch manual page for additional information (man sbatch).
9. Job Arrays
If a need arises to submit the same job to the batch system multiple times, instead of issuing one sbatch command for each individual job, users can submit a job array. Job arrays allow users to submit multiple jobs with a single job script using the ‑‑array option to sbatch. An optional slot limit can be specified to limit the amount of jobs that can run concurrently in the job array. See the sbatch manual page for details (man sbatch). The file names for the input, output, etc. can be varied for each job using the job array index value defined by the SLURM environment variable SLURM_ARRAY_TASK_ID.
A sample batch script that makes use of job arrays is available in /projects/consult/slurm/jobarray.sbatch.
Notes:
- Valid specifications for job arrays are
‑‑array 1-10
‑‑array 1,2,6-10
‑‑array 8
‑‑array 1-100%5 (a limit of 5 jobs can run concurrently) - You should limit the number of batch jobs in the queues at any one time to 1,000 or less. (Each job within a job array is counted as one batch job.)
- Interactive batch jobs are not supported with job array submissions.
- To delete job arrays, see the qdel command section.
10. Running Matlab Batch Jobs
See the Using MATLAB on the Campus Cluster page for information on running MATLAB batch jobs.
11. Running Mathematica Batch Jobs
Standard batch job
A sample batch script that runs a Mathematica script is available in /projects/consult/slurm/mathematica.sbatch. You can copy and modify this script for your own use. Submit the job with:
[golubh1 ~]$ sbatch mathematica.sbatch
In an interactive batch job
- For the GUI (which will display on your local machine), use the –x11 option with the srun command:
srun --x11 --export=All --time=00:30:00 --nodes=1 --ntasks-per-node=16 --partition=secondary --pty /bin/bash
Once the batch job starts, you will have an interactive shell prompt on a compute node. Then type:
module load mathematica mathematica
Note: An X-Server must be running on your local machine with X11 forwarding enabled within your ssh connection in order to display X-Apps, GUIs, etc … back on your local machine. Generally users on Linux based machines only have to enable X11 forwarding by using the -X option with the ssh command. While users on Windows machines will need to ensure that their ssh client has X11 forwarding enabled and an X-Server is running. A list of ssh clients (which includes a combo packaged ssh client and X-Server) can be found in the ssh section. Additional information about running X applications can be found on the Using the X Window System page.
-
For the command line interface:
srun --export=All --time=00:30:00 --nodes=1 --ntasks-per-node=16 --partition=secondary --pty /bin/bash
Once the batch job starts, you will have an interactive shell prompt on a compute node. Then type:
module load mathematica math
12. R Software
See the R on the Campus Cluster page for versions available and information on installing add-on packages.
13. Python Software
See the Python on the Campus Cluster page for versions available and information on installing add-on packages.
14. HPC & Other Tutorials
The NSF-funded Access program offers training on various HPC topics—see Access Training and Events for links to the available courses.
Introduction to Linux offered by the LINUX Foundation.
15. Investor Specific Information
See here for the technical representative of each investor group and links to investor web sites (if available).