1. Connecting
  2. Managing your Account
  3. Storage
  4. Data Transfer
  5. Managing Your Environment (Modules)
  6. Programming Environment
  7. Running Jobs
  8. Job Dependencies
  9. Job Arrays
  10. Running MATLAB Batch Jobs
  11. Running Mathematica Batch Jobs
  12. R Software
  13. HPC & Other Tutorials
  14. Investor Specific Information

1. Connecting

The Campus Cluster can be accessed via Secure Shell (SSH) to the head nodes using your official University NetID login and password. Generally Unix/Linux based systems have a ssh client by default, however Windows users will need to install third party software to access the Campus Cluster. Please see this non-exhaustive list of ssh clients that can be used to access the Campus Cluster.

Below is a list of hostnames that provide round-robin access to head nodes of the Campus Cluster instances as indicated:

Access Method Hostname Head Node
SSH cc-login.campuscluster.illinois.edu namehN
(ex. golubh1)

Network Details for Illinois Investors

The Campus Cluster is interconnected with the University of Illinois networks via the Campus Advanced Research Network Environment (CARNE) and is addressed out of fully-accessible public IP space, located outside of the Illinois campus firewall. This positioning of the Campus Cluster outside the campus firewall enables access to regional and national research networks at high speeds and without restrictions. This does mean, however, that for some special use cases where it is necessary for Campus Cluster nodes to initiate communication with hosts on the Illinois campus network (e.g., you are hosting a special license server behind the firewall), you will need to coordinate with your department IT pro to ensure that your hosts are in the appropriate Illinois campus firewall group. Outbound communication from Illinois to the Campus Cluster should work without issue, as well as any communications from the Campus Cluster outbound to regional and national research networks.

2. Managing your Account

When your account is first activated, the default shell is set to bash.

The tcsh shell is also available. To change your shell to tcsh, add the following line:

exec -l /bin/tcsh

to the end of the file named .bash_profile, located in your home ($HOME) directory. To begin using this new shell, you can either log out and then log back in, or execute exec -l /bin/tcsh on your command line.

The Campus Cluster uses the module system to set up the user environment. See the section Managing Your Environment (Modules) for details.

You can reset your NetID password at the Password Management page.

3. Storage

Home Directory

Your home directory is the default directory you are placed in when you log on. You should use this space for storing files you want to keep long term such as source code, scripts, etc. Every user has a 2 GB home directory quota.

The soft limit is 2 GB and the hard limit is 4 GB. Under the quota rules, if the amount of data in your home directory is over the soft limit of 2 GB but under the hard limit of 4 GB, there is a grace period of 7 days to get under the soft limit. When the grace period expires, you will not be able to write new files or update any current files until you reduce the amount of data to below 2 GB.

The command to see your disk usage and limits is quota.

Example:

    [jdoe@golubh4 ~]$ quota
    Directories quota usage for user jdoe:

    -------------------------------------------------------------------------------------
    |      Fileset       |  Used   |  Soft   |  Hard   |   Used   |   Soft   |   Hard   |
    |                    |  Block  |  Quota  |  Limit  |   File   |   Quota  |   Limit  |
    -------------------------------------------------------------------------------------
    | home               | 501.1M  | 2G      | 4G      | 14       | 0        | 0        |
    | scratch            | 1.249G  | 10T     | 10T     | 206088   | 0        | 0        |
    | cse-shared         | 0       | 1.465T  | 1.953T  | 1        | 0        | 0        |
    -------------------------------------------------------------------------------------

Home directories are backed up using snapshots.

Scratch Directory

The scratch filesystem is shared storage space available to all users. It is intended for short term use and should be considered volatile. No backups of any kind are performed for this storage. There is a soft link named scratch in your home directory that points to your scratch directory.

Scratch Purge Policy:
All files located in scratch (/scratch/users) that are older than 30 days will be purged (deleted).

Project Space

For investors that have project space (/projects/investor_group_name), usage and quota information is available with the command:

[golubh1 ~]$ projectquota <project_directory_name>

Please consult with your investor technical representative regarding availability and access.

Snapshots

Nightly snapshots of the home and project filesystems are available for the last 30 days in the following locations:

  • Home Directory: /gpfs/iccp/home/.snapshots/home_YYYYMMDD*/$USER
  • Investor Project Directory: /gpfs/iccp/projects/<project_directory_name>/.snapshots/<project_directory_name>_YYYYMMDD*

Note: Since snapshots are created nightly, there is a window of time between snapshots when recent file changes are NOT recoverable if accidentally deleted, overwritten, etc.

No off-site backups for disaster recovery are provided for any storage. Please make sure to do your own backups of any important data on the Campus Cluster to permanent storage as often as necessary.

Data Compression

To reduce space usage in your home directory, an option for files that are not in active use is to compress them. The gzip utility can be used for file compression and decompression. Another alternative is bzip2, which usually yields a better compression ratio than gzip but takes longer to complete. Additionally, files that are typically used together can first be combined into a single file and then compressed using the tar utility.

Examples:

    • Compress a file largefile.dat using gzip:
      gzip largefile.dat
      The original file is replaced by a compressed file named largefile.dat.gz

 

    • To uncompress the file:
      gunzip largefile.dat.gz (or: gzip -d largefile.dat.gz)

 

    • To combine the contents of a subdirectory named largedir and compress it:
      tar -zcvf largedir.tgz largedir
      [convention is to use extension .tgz in the file name]
      Note: If the files to be combined are in your home directory and you are close to the quota, you can create the tar file in the scratch directory (since the tar command may fail prior to completion if you go over quota):
      tar -zcvf ~/scratch/largedir.tgz largedir

 

  • To extract the contents of the compressed tar file:
    tar -xvf largedir.tgz

See the manual pages (man gzip, man bzip2, man tar) for more details on these utilities.

Notes:

  • ASCII text and binary files like executables can yield good compression ratios. Image file formats (gif, jpg, png, etc.) are already natively compressed so further compression will not yield much gains.
  • Depending on the size of the files, the compression utilities can be compute intensive and take a while to complete. Use the compute nodes via a batch job for compressing large files.
  • With gzip, the file is replaced by one with the extension .gz. When using tar the individual files remain—these can be deleted to conserve space once the compressed tar file is created successfully.
  • Use of tar and compression could also make data transfers between the Campus Cluster and other resources more efficient.

4. Data Transfer

Campus Cluster data transfers can be initiated via Globus Online’s GridFTP data transfer utility as well as SSH based tools scp (Secure Copy) and sftp (Secure FTP).

GridFTP

The Illinois Campus Cluster Program recommends using Globus Online for Campus Cluster large data transfers. Globus Online manages the data transfer operation for the user: monitoring performance, retrying failures, auto-tuning and recovering from faults automatically where possible, and reporting status. Email is sent when the transfer is complete.

Globus Online implements data transfer between machines through a web interface using the GridFTP protocol. There is a predefined GridFTP endpoint for the Illinois Campus Cluster Program to allow data movement between the Campus Cluster and other resources registered with Globus Online.

To transfer data between the Campus Cluster and a non registered resource, Globus Online provides a software package called Globus Connect that allows for the creation of a personal GridFTP endpoint for virtually any local (non Campus Cluster) resource.

Steps to use Globus Online (GO) for Campus Cluster data transfers
Data transfer to existing GO endpoint Create a new GO endpoint for data transfers
  • Type in or select one of your target endpoints from the 1st pull down selection box.
  • Activate the the endpoint.
  • Download and Install the Globus Connect software for your OS.
    Note: The Globus Connect software should be installed on the machine that you want to setup as an endpoint.
  • Type in your endpoint name that you created during the Globus Connect installation in the 1st endpoint selection box.
  • Type in or select “illinois#iccp” for your other endpoint in the 2nd pull down selection box.
  • Activate the Illinois Campus Cluster endpoint by authenticating using your official University NetID and NetID password.
  • Highlight the data to be transferred and click the appropriate transfer arrow between the two endpoint selection boxes.

SSH

For initiating data transfers from the Campus Cluster, the SSH based tools sftp (Secure FTP) or scp (Secure Copy) can be used.

A variety of SSH based clients are available for initiating transfers from your local system. There are two types of SSH clients, clients that support both remote login access and data transfers and clients that support data transfers only.

SSH Client Remote Login Data Transfer Installs On
MobaXterm is an enhanced terminal with an X server and a set of Unix commands (GNU/Cygwin) packaged in the application. Yes Yes Windows
SSH Secure Shell allows you to securely login to remote host computers, to execute commands safely on a remote computer, and to provide secure encrypted and authenticated communications between two hosts in an untrusted network. Yes Yes Windows
Tunnelier is a flexible SSH client which includes terminal emulation, graphical as well as command-line SFTP support, an FTP-to-SFTP bridge, additional tunneling features including dynamic port forwarding through integrated proxy. Yes Yes Windows
PuTTY is an open source terminal emulator application which can act as a client for the SSH, Telnet, rlogin, and raw TCP computing protocols and as a serial console client. Yes Yes* Windows
Linux
Mac OS
FileZilla is a fast and reliable cross-platform FTP, FTPS and SFTP client with lots of useful features and an intuitive graphical user interface. No Yes Windows
Linux
Mac OS
WinSCP is an open source free SFTP client, SCP client, FTPS client and FTP client for Windows. Its main function is file transfer between a local and a remote computer. Beyond this, WinSCP offers scripting and basic file manager functionality. No Yes Windows
FireFTP is a free, secure, cross-platform FTP/SFTP client for Mozilla Firefox which provides easy and intuitive access to FTP/SFTP servers. No Yes Firefox
(add-on)

* PuTTY’s scp and sftp data transfer functionality is implemented via Command Line Interface (CLI) by default.

5. Managing Your Environment (Modules)

The module command is a user interface to the Modules package. The Modules package provides for the dynamic modification of the user’s environment via modulefiles (a modulefile contains the information needed to configure the shell for an application). Modules are independent of the user’s shell, so both tcsh and bash users can use the same commands to change the environment.

Useful Module commands:

Command Description
module avail lists all available modules
module list lists currently loaded modules
module help modulefile help on module modulefile
module display modulefile Display information about modulefile
module load modulefile load modulefile into current shell environment
module unload modulefile remove modulefile from current shell environment
module swap modulefile1 modulefile2 unload modulefile1 and load modulefile2

To include particular software in the environment for all new shells, edit your shell configuration file ($HOME/.bashrc for bash users and $HOME/.cshrc for tcsh users) by adding the module commands to load the software that you want to be a part of your environment. After saving your changes, you can source your shell configuration file or log out and then log back in for the changes to take effect.

Note: Order is important. With each module load, the changes are prepended to your current environment paths.

For additional information on Modules, see the module and modulefile man pages or visit the Modules SourceForge page.

6. Programming Environment

The Intel compilers are available on the Campus Cluster.
module load intel/18.0
[Older versions of the Intel compiler are also available. See the output from the command module avail intel for the specific modules.]

The GNU compilers (GCC) version 4.4.7 are in the default user environment. Version 7.2.0 is also available — load this version with the command:
module load gcc/7.2.0

Compiler Commands

Serial

To build (compile and link) a serial program in Fortran, C, and C++ enter:

GCC Intel Compiler
gfortran myprog.f
gcc myprog.c
g++ myprog.cc
ifort myprog.f
icc myprog.c
icpc myprog.cc

MPI

To build (compile and link) a MPI program in Fortran, C, and C++:

MPI Implementation modulefile for MPI/Compiler Build Commands
MVAPICH2
(Home Page / User Guide)
mvapich2/2.1rc1-intel-15.0
mvapich2/2.2-intel-17.0
mvapich2/2.2-gcc-6.2.0
Fortran 77: mpif77 myprog.f
Fortran 90: mpif90 myprog.f90
C: mpicc myprog.c
C++: mpicxx myprog.cc

Open MPI
(Home Page / Documentation)
openmpi/1.8.4-intel-15.0
openmpi/2.0.1-gcc-6.2.0
openmpi/3.0.1-gcc-7.2.0
Intel MPI
(Home Page / Documentation)
intel/18.0
              GCC         Intel Compiler
Fortran 77: mpif77 myprog.f mpiifort myprog.f
Fortran 90: mpif90 myprog.f90 mpiifort myprog.f90
C: mpicc myprog.c mpiicc myprog.c
C++: mpicxx myprog.cc mpiicpc myprog.cc

For example, use the following command to load MVAPICH2 v2.2 built with the Intel 17.0 compiler:

module load mvapich2/2.2-intel-17.0

OpenMP

To build an OpenMP program, use the -fopenmp / -qopenmp option:

GCC Intel Compiler
gfortran -fopenmp myprog.f
gfortran -fopenmp myprog.f90
gcc -fopenmp myprog.c
g++ -fopenmp myprog.cc
ifort -openmp myprog.f
ifort -qopenmp myprog.f90
icc -qopenmp myprog.c
icpc -qopenmp myprog.cc

Hybrid MPI/OpenMP

To build an MPI/OpenMP hybrid program, use the -fopenmp / -qopenmp option with the MPI compiling commands:

GCC Intel Compiler
MVAPICH2 OpenMPI Intel MPI MVAPICH2 OpenMPI Intel MPI
mpif77 -fopenmp myprog.f
mpif90 -fopenmp myprog.f90
mpicc -fopenmp myprog.c
mpicxx -fopenmp myprog.cc
mpif77 -fopenmp myprog.f
mpif90 -fopenmp myprog.f90
mpicc -fopenmp myprog.c
mpicxx -fopenmp myprog.cc
mpiifort -qopenmp myprog.f
mpiifort -qopenmp myprog.f90
mpiicc -qopenmp myprog.c
mpiicpc -qopenmp myprog.cc

CUDA

NVIDIA GPUs are available as a purchase option in the Golub instance of the campus cluster. CUDA is a parallel computing platform and programming model from NVIDIA for use on their GPUs. These GPUs support CUDA compute capability 2.0.

Load the CUDA Toolkit into your environment using the following module command:

module load cuda

Libraries

The Intel Math Kernel Library (MKL) contains the complete set of functions from the basic linear algebra subprograms (BLAS), the extended BLAS (sparse), and the complete set of LAPACK routines. In addition, there is a set of fast Fourier transforms (FFT) in single- and double-precision, real and complex data types with both Fortran and C interfaces. The library also includes the cblas interfaces, which allow the C programmer to access all the functionality of the BLAS without considering C-Fortran issues. ScaLAPACK, BLACS and the PARDISO solver are also provided by Intel MKL. MKL provides FFTW interfaces to enable applications using FFTW to gain performance with Intel MKL and without changing the program source code. Both FFTW2 and FFTW3 interfaces are provided as source code wrappers to Intel MKL functions.

Load the Intel compiler module to access MKL.

Use the following -mkl flag options when linking with MKL using the Intel compilers:

Sequential libraries: -mkl=sequential
Threads libraries: -mkl=parallel

To use MKL with GCC, consult the Intel MKL link advisor for the link flags to include.

OpenBLAS, an optimized BLAS library based on GotoBLAS2 is also available. Load the library (version 0.2.8, built with gcc 4.7.1) module with the following command:

module load openblas/0.2.8-gcc

Link with the OpenBLAS library using

-L /usr/local/math/OpenBLAS/0.2.8-gcc/lib -lopenblas

7. Running Jobs

User access to the compute nodes for running jobs is only available via a batch job. The Campus Cluster uses the Torque Resource Manager with the Moab Workload Manager for running batch jobs. Torque is based on OpenPBS, so the commands are the same as PBS commands. See the qsub section Batch Commands for details on batch job submission.

Please be aware that the interactive nodes are a shared resource for all users of the system and their use should be limited to editing, compiling and building your programs, and for short non-intensive runs.

Starting Friday May 16, 2014, user processes running on the interactive nodes are killed automatically if they accrue more than 30 minutes of CPU time or if more than 4 identical processes owned by the same user are running concurrently.

An interactive batch job provides a way to get interactive access to a compute node via a batch job. See the qsub -I section for information on how to run an interactive job on the compute nodes. Also, a very short time test queue provides quick turnaround time for debugging purposes.

To ensure the health of the batch system and scheduler users should refrain from having more than 1,000 batch jobs in the queues at any one time.

See the document *** Running Serial Jobs Efficiently on the Campus Cluster for information on expediting job turnaround time in serial jobs.

See the Running MATLAB / Mathematica Batch Jobs sections for information on running MATLAB and Mathematica on the campus cluster.

Running Programs

On successful building (compilation and linking) of your program, an executable is created that is used to run the program. The table below describes how to run different types of programs.

Program Type How to run the program/executable Example Command
Serial To run serial code, specify the name of the executable. ./a.out
MPI MPI programs are run with the mpiexec command followed by the name of the executable.

Note: The total number of MPI processes is the {number of nodes} x {cores/node} set in the batch job resource specification.

mpiexec ./a.out
(Use mpirun under Intel MPI)
OpenMP The OMP_NUM_THREADS environment variable can be set to specify the number of threads used by OpenMP programs. If this variable is not set, the number of threads used defaults to one under the Intel compiler. Under GCC, the default behavior is to use one thread for each core available on the node.

To run OpenMP programs, specify the name of the executable.

In bash: export OMP_NUM_THREADS=16

In tcsh: setenv OMP_NUM_THREADS 16

./a.out

MPI/OpenMP As with OpenMP programs, the OMP_NUM_THREADS environment variable can be set to specify the number of threads used by the OpenMP portion of the mixed MPI/OpenMP program. The same default behavior applies with respect to the number of threads used.

Use the mpiexec command followed by the name of the executable to run mixed MPI/OpenMP programs.

Note: The number of MPI processes per node is set in the batch job resource specification for number of cores/node.

In bash: export OMP_NUM_THREADS=4

In tcsh: setenv OMP_NUM_THREADS 4

mpiexec ./a.out
(Use mpirun under Intel MPI)

Primary Queues

Each investor group has unrestricted access to a dedicated primary queue with concurrent access to the number and type of nodes in which they invested.

Secondary Queue

One of the advantages of the Campus Cluster Program is the ability to share resources. A shared secondary queue will allow users access to any idle nodes in the cluster. Users must have access to a primary queue to be eligible to use the secondary queue.

While each investor has full access to the number and type of nodes in which they invested, those resources not fully utilized by each investor will become eligible to run secondary queue jobs. If there are resources eligible to run secondary queue jobs but there are no jobs to be run from the secondary queue, jobs in the primary queues that fit within the constraints of the secondary queue may be run on any otherwise appropriate idle nodes. The secondary queue uses fairshare scheduling.

The current limits in the secondary queue are below:

Queue Max Walltime Max # Nodes
secondary 4 hours 208

Notes:

    • Jobs are routed to the secondary queue when a queue is not specified. i.e., the secondary queue is the default queue on the campus cluster.

Test Queue

A test queue is available for providing very short jobs with quick turnaround time.

The current limits in the test queue are:

Queue Max Walltime Max # Nodes
test 5 minutes 2

Batch Commands

Below are brief descriptions of the primary batch commands. For more detailed information, refer to the individual man pages.

qsub

Batch jobs are submitted through a job script using the qsub command. Job scripts generally start with a series of PBS directives that describe requirements of the job such as number of nodes, wall time required, etc. to the batch system/scheduler (PBS directives can also be specified as options on the qsub command line; command line options take precedence over those in the script). The rest of the batch script consists of user commands.

Sample batch scripts are available in the directory /projects/consult/pbs.

The syntax for qsub is:

qsub [list of qsub options] script_name

The main qsub options are listed below. Also see the qsub man page for other options.

  • -l resource-list: specifies resource limits. The resource_list argument is of the form:
    
    resource_name[=[value]][,resource_name[=[value]],...]:resource
    
    

    The common resource_names are:
    walltime=time

    time=maximum wall clock time (hh:mm:ss) [default: 30 mins in primary queues; 10 mins in secondary queue]

    nodes=n:ppn=p

    n=number of 16/20/24/28-core nodes [default: 1 node]
    p=how many cores per node to use (1 through 28) [default: ppn=1]

    nodes=n,flags=allprocs

    allprocs allocates all the cores on every node that is assigned to the job

    Examples:
    -l walltime=00:30:00,nodes=2:ppn=16
    -l walltime=01:00:00,nodes=1,flags=allprocs

    NOTE: the ppn and allproc options are incompatible so should not be used together.

    [For users porting from other systems, note that the -l ncpus syntax may not work as expected on the campus cluster. So do not use it – please only use ppn to specify cores per node.]

    Memory needs: For investors that have nodes with varying amounts of memory or to run in the secondary
    queue, nodes with a specific amount of memory can be targeted. The compute nodes have memory configurations of 24GB,
    48GB, 64GB, 96GB, 128GB or 256GB. The resource names to get scheduled on nodes with a specific amount of memory are
    m24G, m48G, m64G, m96G, m128G and m256G. Not all memory configurations are
    available in all investor queues. Please check with the technical
    representative
    of your investor group to determine what memory configurations are available for the nodes in your
    primary queue.

    Example:
    -l walltime=00:30:00,nodes=1:ppn=16:m96G

    Note: Do not use the memory specification unless absolutely required since it could delay scheduling of the
    job; also, if nodes with the specified memory are unavailable for the specified queue the job will never run.

    Node Access Policy: Investors can choose to set up their queues to be
    singlejob which limits a node to a single job at a time, or
    shared which allows multiple
    jobs per node.

    Please check with the technical representative of your investor group on
    how your queue is set up. The secondary queue is set up as
    singlejob.

    A user can choose to override the node access policy of the queue by specifying the following directive:

    • For singlejob queues, jobs owned by the same user can share a node  with the option:
          -l naccesspolicy=singleuser
      
    • For shared queues, only a single job can be scheduled on a node with the option:
          -l naccesspolicy=singlejob
      

    Specifying nodes with GPUs: To run jobs on nodes with GPUs,  add the resource specification TeslaM2090 (for Tesla M2090) or TeslaK40M
    (for Tesla K40M).

    Example:
    -l walltime=00:30:00,nodes=1:ppn=16:TeslaM2090


  • -q queue_name: specify queue name.[default: secondary]
  • -N jobname: specifies the job name.
  • -W depend=dependency_list: defines the dependency between current and other jobs. See example jobscript.
  • -t array_request: Specifies the task ids of a job array. The array_request argument is an integer id or a range of integers. Multiple ids or id ranges can be combined in a comma delimted list. See example jobscript.
  • -o out_file: store the standard output of the job to file out_file. After the job is done, this file will be found in the directory from which the qsub command was issued. [default :<jobname>.o<JobID>]
  • -e err_file: store the standard error of the job to file err_file. After the job is done, this file will be found in the directory from which the qsub command was issued. [default :<jobname>.e<JobID>]
  • -j oe: merge standard output and standard error into standard output file.
  • -V: export all your environment variables to the batch job.
  • -m be: send mail at the beginning and end of a job.
  • -M myemail@myuniv.edu: send any email to given email address.
  • -X: enables X11 forwarding.
Useful PBS Environment Variables
JobID $PBS_JOBID Job identifier assigned to the job
Job Submission Directory $PBS_O_WORKDIR By default, jobs start in the user’s home directory. To go to the directory from which the job was submitted, use the following line in the batch script: cd $PBS_O_WORKDIR
Machine(node) list $PBS_NODEFILE the name of the file containing the list of nodes assigned to the batch job
Array JobID $PBS_ARRAYID each member of a job array is assigned a unique identifier (see the Job Arrays section)

See the qsub man page for additional environment variables available.

qsub -I

The -I option tells qsub you want to run an interactive job on the compute nodes.

For example, the following command:

[golubh1 ~]$ qsub -q ncsa -I -l walltime=00:30:00,nodes=1:ppn=16

will run an interactive job in the ncsa queue with a wall clock limit of 30 minutes, using one node and 16 cores per node. You can also use other qsub options such as those documented above.

After you enter the command, you will have to wait for Torque to start the job. As with any job, your interactive job will wait in the queue until the specified number of nodes is available. If you specify a small number of nodes for smaller amounts of time, the wait should be shorter because your job will backfill among larger jobs. You will see something like this:

qsub: waiting for job 123456.cc-mgmt1.campuscluster.illinois.edu to start

Once the job starts, you will see:

qsub: job 123456.cc-mgmt1.campuscluster.illinois.edu ready

and will be presented with an interactive shell prompt on the launch node. At this point, you can use the appropriate command to start your program.

When you are done with your runs, you can use the exit command to end the job.

qstat

The qstat command displays the status of batch jobs.

  • qstat -a gives the status of all jobs on the system.
  • qstat -u $USER gives the status of your jobs.
  • qstat -n JobID lists nodes allocated to a running job in addition to basic information.
  • qstat -f JobID gives detailed information on a particular job.
  • qstat -q provides summary information on all the queues.
  • qstat -t JobID[] gives the status of all the jobs within a job array. Use JobID[<index>] to display the status of a specific job within a job array.

See the man page for other options available.

qs

The qs command displays a detailed table of the status of PBS batch jobs, and can be used as an alternative to the qstat command. Information like job dependency, queue wait time, elapse time and execution host for running jobs can be viewed at a glance.

qs will display all queued and running jobs. See qs -h for options.

qdel

The qdel command deletes a queued job or kills a running job. To delete/kill jobs within a job array the square brackets “[]” must be specified with the JobID.

  • qdel JobID deletes/kills a job.
  • qdel JobID[] deletes/kills the entire job array.
  • qdel JobID[<index>] deletes/kills a specific job of the job array.

qpeek

The qpeek command displays the contents of a running job’s output spool files. The basic syntax is qpeek JobID which will show the stdout file of the job. To list all available options, type qpeek -help.

Note: You only need to use the numeric part of the JobID when specifying JobID.

8. Job Dependencies

PBS job dependencies allow users to set execution order in which their queued jobs run. Job dependencies are set by using the -W option with the syntax being -W depend=<dependency type>:<JobID>. PBS places the jobs in Hold state until they are eligible to run.

The following are examples on how to specify job dependencies using the afterany dependency type, which indicates to pbs that the dependent job should become eligible to start only after the specified job has completed.

On the command line:

[golubh1 ~]$ qsub -W depend=afterany:<JobID> jobscript.pbs

In a job script:

#!/bin/bash
#PBS -l walltime=00:30:00
#PBS -l nodes=1:ppn=16
#PBS -N myjob
#PBS -j oe
#PBS -W depend=afterany:<JobID>

In a shell script that submits batch jobs:

#!/bin/bash
JOB_01=`qsub jobscript1.pbs`
JOB_02=`qsub -W depend=afterany:$JOB_01 jobscript2.pbs`
JOB_03=`qsub -W depend=afterany:$JOB_02 jobscript3.pbs`
...

Note: Generally the recommended dependency types to use are before, beforeany, after and afterany. While there are additional dependency types, those types that work based on batch job error codes may not behave as expected because of the difference between a batch job error and application errors. See the dependency section of the qsub manual page for additional information (man qsub).

9. Job Arrays

If a need arises to submit the same job to the batch system multiple times, instead of issuing one qsub command for each individual job, users can submit a job array. Job arrays allow users to submit multiple jobs with a single job script using the -t option to qsub. An optional slot limit can be specified to limit the amount of jobs that can run concurrently in the job array. See the qsub manual page for details (man qsub). The file names for the input, output, etc. can be varied for each job using the job array index value defined by the the PBS environment variable PBS_ARRAYID.

A sample batch script that makes use of job arrays is available in /projects/consult/pbs/jobarray.pbs.

Notes:

  • Valid specifications for job arrays are
    -t 1-10
    -t 1,2,6-10
    -t 8
    -t 1-100%5 (a limit of 5 jobs can run concurrently)
  • You should limit the number of batch jobs in the queues at any one time to 1,000 or less. (Each job within a job array is counted as one batch job.)
  • Interactive batch jobs are not supported with job array submissions.
  • For job arrays, use of any environment variables relating to the JobID (e.g., PBS_JOBID) must be enclosed in double quotes.
  • To delete job arrays, see the qdel command section.

10. Running Matlab Batch Jobs

See the Using MATLAB on the Campus Cluster page for information on running MATLAB batch jobs.

11. Running Mathematica Batch Jobs

Standard batch job

A sample batch script that runs a Mathematica script is available in /projects/consult/pbs/mathematica.pbs. You can copy and modify this script for your own use. Submit the job with:

[golubh1 ~]$ qsub mathematica.pbs

In an interactive batch job

  • For the GUI (which will display on your local machine), use the -X option with the qsub command:
    qsub -I -X -V -l walltime=00:30:00,nodes=1:ppn=16

    Once the batch job starts, you will have an interactive shell prompt on a compute node. Then type:

    module load mathematica
    mathematica
    

    Note: An X-Server must be running on your local machine with X11 forwarding enabled within your ssh connection in order to display X-Apps, GUIs, etc … back on your local machine. Generally users on Linux based machines only have to enable X11 forwarding by using the -X option with the ssh command. While users on Windows machines will need to ensure that their ssh client has X11 forwarding enabled and an X-Server is running. A list of ssh clients (which includes a combo packaged ssh client and X-Server) can be found in the ssh section. Additional information about running X applications can be found on the Using the X Window System page.

  • For the command line interface:
    qsub -I -V -l walltime=00:30:00,nodes=1:ppn=16

    Once the batch job starts, you will have an interactive shell prompt on a compute node. Then type:

    module load mathematica
    math
    

12. R Software

See the R on the Campus Cluster page for versions available and information on installing add-on packages.

13. HPC & Other Tutorials

The NSF-funded XSEDE program offers online training on various HPC topics—see XSEDE Online Training for links to the available courses.

Introduction to Linux offered by the LINUX Foundation.

14. Investor Specific Information

See here for the technical representative of each investor group and links to investor web sites (if available).