|Using the MOAB Workload Manager|
This article describes how to use some useful MOAB commands and how to submit different types of jobs to the MOAB scheduler at the FSU HPC. Don't forget to also read the FAQ on Running jobs on the cluster.
MOAB Command Reference
The following is a list of the frequently used MOAB commands.
See the cluster resources Command Overview page for more information about each individual command.
This job shows a basic example of running a simple job in MOAB and some of the useful variables available to users from PBS. Below is an example MOAB script. In this example, this script is saved as moab.ex1.
The directive #MOAB -N moab_ex1 names the job for moab and will consequently name the standard output and standard error to the files moab_ex1.o and moab_ex1.e. To join the standard output and the standard error to a single file moab_ex1.o add the directive:#MOAB -j oe. If the name is not specified, the default is for the batch system to create two files, one for standard output and one for standard error named STDIN.o and STDIN.e. The remaining variables are self-explanatory. Below is the output from running this script (saved in a file as moab.ex2) in a users home directory:
[jmcdon@scs ~]$ msub moab.ex1
There are several moab parameters that you can use in the preamble of your script or as a parameter to msub. If you give them in your script, you have to start the line with "#MOAB", for example "#MOAB -N name". A few useful options are:
MOAB does not support the notion of a job array such as other batch engines support. For example, only one serial job can be submitted per submit file. To run a serial job under the MOAB system, compile your code as needed and write a file that can be used to execute it. If your program is in the sub-directory your submission script is in, that directory is referenced as the environmental variable $PBS_O_WORKDIR. The executable in this example is called mytest. The MOAB-script would look like:
It is not required that this script be an executable shell script, because MOAB ignores the shell command directive. However, having an executable script is useful for debugging purposes if MOAB jobs fail to start. To start your job run:
$ msub moab.ex3
The command msub returns the jobID from the submitted job. You can check the status of your job using the checkjob command.
$ checkjob 1159
In most cases, you don't want to specify the topology (the exact number of processors per node). If you let MOAB choose the topology for you, your job might run sooner. If there are not enough nodes available to run an interactive job immediately or if the topology you picked is not available, you will have to wait!
On the HPC cluster, by default all Message Passing Interface libraries are configured to use the infiniband network to pass data. No special user knowledge or action is required to use the infiniband network.
In the next few sections we will show how to run parallel jobs using the three different MPI implementations that are installed on the cluster. The files for these examples can be found in the directory /panfs/storage.local/system/tutorial/example2. The code that we use in these examples, trap.c, is a simple trapezoid integration program.
The user is free to choose which MPI library and compiler best suits his or her needs. However, when one compiles a program with a certain library and compiler, it is best to run the program with these tools. In the following examples you can switch compilers by substituting intel for gnu and vice versa.
To compile the trapezoid program using the Intel compiler, we first have to make sure it is in our path by executing:
$ module load intel
We can check if we have the right compiler by running which mpicc and mpicc -v. To compile our program we run:
$ mpicc -o trap-mpichv2 trap.c -lm
Note: The above command may produce the message: "warning: feupdateenv is not implemented and will always fail." This is normal and no cause for concern unless you know for certain that your compilation requires the use of this function.
To submit the program to a batch system, you must create a startup script with the appropriate topology. Here is an example that requests 8 nodes, thus a total of 8 processes. The host file that mvampich2 uses must have the number of processors to use as host:N, where N is the number of processes. The mpdboot processes are started at one per node and the mpirun should start jobs at N per node. The argument to mpdboot must match the number of nodes and the argument to mpirun must match the number of nodes times the number of processes per node.
This script first sets up the right environment for the mpichv2 paths for the Intel compiler and then the mpirun program starts the executable trap-mpich2 on 8 nodes. In this example we have set the walltime to 1 hour (60 minutes). Although it's not imperative to set the walltime property, it does make it easier for the scheduler to schedule your job. Be sure not under-estimate the walltime, but over-estimate a bit.
If the file is saved as trap-mpichv2.sh, the job can be executed with:
$ msub trap-mpichv2.sh
The above script makes the moab job submission very flexible. The number of processes that you request will be obtained not by the topology requested but by the nodes that have free cpus. For example, a 128 node job may end up working on 64 nodes with 2 free CPUs or 128 nodes with 1 free CPU or some linear combination of this.
This example script can be found on the cluster in /panfs/storage.local/system/tutorial/example2/trap-mpichv2.sh.
OpenMPI is more flexible than mvapich 2 with the nodes file and can directly use the PBS_NODEFILE given by the job scheduler. The setup of the GNU version of OpenMPI is described here. Before we compile our example program, we have to make sure that we run the right compiler:
$ module load gnu-openmpi
We can then compile our trap.c file like:
mpicc -o trap-openmpi trap.c
to obtain the OpenMPI executable trap-openmpi. This can now be run with the MOAB script below:
Save the moab script as trap-openmpi.sh. Submit the job to MOAB using:
$ msub trap-openmpi.sh
Once the job has completed, you should receive an output file: TRAP-OPENMPI.o which looks like:
Number of procs 4 With n = 1024 trapezoids,
This example was done using the gcc version of OpenMPI. There is also a version of Intel OpenMPI available. The setup of OpenMPI Intel is described here.
Example 5: Submitting a parallel job with mvapich1
The mvapich implementation of mpich version 1 is installed on the HPC cluster. This version seamlessly integrates the communication layers for the infiniband fabric for the users. It is one of the easiest implementations to use. However, it suffers in both robustness and ease of cleanup. The use of mpich version 2 or openmpi is strongly encouraged above this version.
To compile our test program with the gnu compiler, we run the following commands:
$ module load gnu-mvapich
Using the following script, we can submit our job:
When save in the file mpichv1.sh, we can submit our program using
One can indicate a dependency between two or more jobs by using the -W x=depend:dependency-type:jobid flag for the msub commando. For example to tell moab that job2.sh can only run after job1.sh successfully finishes, one can use:
$ msub job1.sh
This will queue job1.sh with a jobid of 12345, queue job2.sh with jobid 12346 and indicate moab that job2.sh has to wait until job1.sh has finished.
There are a number of job queues available on the cluster and depending on your user type (owner-base or general access) you can submit your job to one of those. For an actual list of queues, see this list.
Jobs submitted to the backfill queue have a wallclock time restriction: their runtime can not exceed 4 hours. When that time limit is exceeded, your job will be cancelled. If you try to reserve a longer walltime than 4 hours, the scheduler will refuse your job.
Run a job in the backfill queue:
$ msub -q backfill trap-mpichv2.sh
Run a job with high-priority:
$ msub -l qos=scs_high trap-mpichv2.sh
Change the quality of service from a job that is waiting
$ checkjob 23831
For more information, see the slides of from the tech series lectures.