Using the MOAB Workload Manager
Article Index
Using the MOAB Workload Manager
Basic MOAB Script
MOAB Serial Job
Running an interactive job
Running an MPI job
MOAB MPICH2 Job
MOAB OpenMPI Job
MOAB MPICH1 Job
Job Dependencies
MOAB Queues
All Pages

This article describes how to use some useful MOAB commands and how to submit different types of jobs to the MOAB scheduler at the FSU HPC. Don't forget to also read the FAQ on Running jobs on the cluster.

 

MOAB Command Reference

The following is a list of the frequently used MOAB commands.

showq
list the jobs in the current queue
msub -q
submit a job to the moab batch queue. If no queuename is specificed then the default queue is used. The job submission script is required. Use msub --help to see the full list of options. msub returns the jobid.
checkjob [-v] jobid
This command allows users to check their job in the event of problems with the job. This command will show you the reason why your job didn't start if it has been deferred or blocked.
showstats
shows the job submission history statistics.
canceljob
cancels a job, takes as an argument the jobID. As a last resort, you can also use mjobctl -C jobID or /opt/torque/bin/qdel jobID if canceljob gives an error.
showstart
gives an estimate of when your job will start to run. Since many people don't indicate the predicated run time of their jobs, this estimate can be completely wrong.
showbf
show the available system resources

See the cluster resources Command Overview page for more information about each individual command.

Examples:
$ showbf -c genacc_q 
(show the available resources in the general access queue)
$ showq -r -w class=coaps_q 
(show the jobs running in the COAPS owner queue)
$ showq -i -w class=coaps_q 
(show idle jobs in the COAPS owner queue)
$ showq -r -w qos=coaps_high 
(show jobs running with QOS coaps_high)

Example 1:Basic MOAB Script

This job shows a basic example of running a simple job in MOAB and some of the useful variables available to users from PBS. Below is an example MOAB script. In this example, this script is saved as moab.ex1.

#!/bin/bash

#MOAB -N moab_ex1

echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo MOAB: qsub is running on $PBS_O_HOST
echo MOAB: originating queue is $PBS_O_QUEUE
echo MOAB: executing queue is $PBS_QUEUE
echo MOAB: working directory is $PBS_O_WORKDIR
echo MOAB: execution mode is $PBS_ENVIRONMENT
echo MOAB: job identifier is $PBS_JOBID
echo MOAB: job name is $PBS_JOBNAME
echo MOAB: node file is $PBS_NODEFILE
echo MOAB: current home directory is $PBS_O_HOME
echo MOAB: PATH = $PBS_O_PATH
echo ----------------------------------------------
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo ' '
echo ' '

The directive #MOAB -N moab_ex1 names the job for moab and will consequently name the standard output and standard error to the files moab_ex1.o and moab_ex1.e. To join the standard output and the standard error to a single file moab_ex1.o add the directive:#MOAB -j oe. If the name is not specified, the default is for the batch system to create two files, one for standard output and one for standard error named STDIN.o and STDIN.e. The remaining variables are self-explanatory. Below is the output from running this script (saved in a file as moab.ex2) in a users home directory:

[jmcdon@scs ~]$ msub moab.ex1

1158
$ cat moab_ex1.o1157
------------------------------------------------------
Job is running on node hpc-3-1.local
------------------------------------------------------
MOAB: qsub is running on admin.hpc.fsu.edu
MOAB: originating queue is default
MOAB: executing queue is default
MOAB: working directory is /home/jmcdon
MOAB: execution mode is PBS_BATCH
MOAB: job identifier is 1157.admin.hpc.fsu.edu
MOAB: job name is moab_ex1
MOAB: node file is /opt/torque/aux//1157.admin.hpc.fsu.edu
MOAB: current home directory is /home/jmcdon
MOAB: PATH = /sbin:/usr/sbin:/bin:/usr/bin:/usr/X11R6/bin
----------------------------------------------
------------------------------------------------------
Job is running on node hpc-3-1.local
------------------------------------------------------

Moab parameters

There are several moab parameters that you can use in the preamble of your script or as a parameter to msub. If you give them in your script, you have to start the line with "#MOAB", for example "#MOAB -N name". A few useful options are:

OptionMeaning
-N name Declares the name of your job to "name" (and outputfiles).
-l nodes=n Requests n nodes for this job
-j oe Have the standard output and the standard error write in the same logfile
-m abe Have moab mail a notification to you if the job gets aborted, begins or ends. You can use any combination of these letter, for example "-m e" if you are only interested if a jobs finishes.
-l walltime=hh:mm:ss Tell the scheduler that your job will run for hh hours, mm minutes and ss seconds. Remember that some queues have a time limit, for example 4 hours for the backfill queue, and that moab will reject your job if you request more time. The default walltime is 14 days for most queues, while the maximum is 90 days. The default and the maximum walltime for the backfill queue is 4 hours. See the FAQ on how to increase the walltime of a running job, up to the maximum walltime.
-q queue Specify the queue to run in.
-l qos=QOS Specify the quality of service QOS for this job. See the section on the different queues for all possible quality of service levels that you can use.

Example 2: Submission of a serial type job.

MOAB does not support the notion of a job array such as other batch engines support. For example, only one serial job can be submitted per submit file. To run a serial job under the MOAB system, compile your code as needed and write a file that can be used to execute it. If your program is in the sub-directory your submission script is in, that directory is referenced as the environmental variable $PBS_O_WORKDIR. The executable in this example is called mytest. The MOAB-script would look like:

#!/bin/bash

#MOAB -j oe
#MOAB -l walltime=120:00

cd $PBS_O_WORKDIR
./mytest

It is not required that this script be an executable shell script, because MOAB ignores the shell command directive. However, having an executable script is useful for debugging purposes if MOAB jobs fail to start. To start your job run:

$ msub moab.ex3
1159

The command msub returns the jobID from the submitted job. You can check the status of your job using the checkjob command.

$ checkjob 1159
.....

Allocate a single core
msub -I
Allocate four cores on 1 node
msub -l nodes=1:ppn=4 -I
Allocate five cores on multiple node
msub -l nodes=5 -I
Allocate three cores on three node
msub -l nodes=3:ppn=1 -I

In most cases, you don't want to specify the topology (the exact number of processors per node). If you let MOAB choose the topology for you, your job might run sooner. If there are not enough nodes available to run an interactive job immediately or if the topology you picked is not available, you will have to wait!


On the HPC cluster, by default all Message Passing Interface libraries are configured to use the infiniband network to pass data. No special user knowledge or action is required to use the infiniband network.

In the next few sections we will show how to run parallel jobs using the three different MPI implementations that are installed on the cluster. The files for these examples can be found in the directory /panfs/storage.local/system/tutorial/example2. The code that we use in these examples, trap.c, is a simple trapezoid integration program.

The user is free to choose which MPI library and compiler best suits his or her needs. However, when one compiles a program with a certain library and compiler, it is best to run the program with these tools. In the following examples you can switch compilers by substituting intel for gnu and vice versa.


Example 3: Submitting a parallel job with mvapich2

To compile the trapezoid program using the Intel compiler, we first have to make sure it is in our path by executing:

$ module load intel
$ module load intel-mvapich2

We can check if we have the right compiler by running which mpicc and mpicc -v. To compile our program we run:

$  mpicc -o trap-mpichv2 trap.c  -lm

Note: The above command may produce the message: "warning: feupdateenv is not implemented and will always fail." This is normal and no cause for concern unless you know for certain that your compilation requires the use of this function.

To submit the program to a batch system, you must create a startup script with the appropriate topology. Here is an example that requests 8 nodes, thus a total of 8 processes. The host file that mvampich2 uses must have the number of processors to use as host:N, where N is the number of processes. The mpdboot processes are started at one per node and the mpirun should start jobs at N per node. The argument to mpdboot must match the number of nodes and the argument to mpirun must match the number of nodes times the number of processes per node.

1 #!/bin/bash
2
3 #MOAB -l nodes=8
4 #MOAB -j oe
5 #MOAB -l walltime=60:00
6
7 module load intel-mvapich2
8
9 mpirun $PBS_O_WORKDIR/trap-mpich2

This script first sets up the right environment for the mpichv2 paths for the Intel compiler and then the mpirun program starts the executable trap-mpich2 on 8 nodes. In this example we have set the walltime to 1 hour (60 minutes). Although it's not imperative to set the walltime property, it does make it easier for the scheduler to schedule your job. Be sure not under-estimate the walltime, but over-estimate a bit.

If the file is saved as trap-mpichv2.sh, the job can be executed with:

$ msub trap-mpichv2.sh 

The above script makes the moab job submission very flexible. The number of processes that you request will be obtained not by the topology requested but by the nodes that have free cpus. For example, a 128 node job may end up working on 64 nodes with 2 free CPUs or 128 nodes with 1 free CPU or some linear combination of this.

This example script can be found on the cluster in /panfs/storage.local/system/tutorial/example2/trap-mpichv2.sh.


Example 4: Submitting OpenMPI jobs

OpenMPI is more flexible than mvapich 2 with the nodes file and can directly use the PBS_NODEFILE given by the job scheduler. The setup of the GNU version of OpenMPI is described here. Before we compile our example program, we have to make sure that we run the right compiler:

 $ module load gnu-openmpi

We can then compile our trap.c file like:

 mpicc -o trap-openmpi trap.c 

to obtain the OpenMPI executable trap-openmpi. This can now be run with the MOAB script below:

#!/bin/bash

#MOAB -l nodes=4
#MOAB -j oe
#MOAB -l walltime=120:00

#MOAB -N TRAP-OPENMPI
module load gnu-openmpi

mpirun $PBS_O_WORKDIR/trap-openmpi

Save the moab script as trap-openmpi.sh. Submit the job to MOAB using:

$ msub trap-openmpi.sh 

Once the job has completed, you should receive an output file: TRAP-OPENMPI.o which looks like:

Number of procs 4 With n = 1024 trapezoids, 
our estimate of the integral from 0.000000
to 1.000000 = 1.000000

This example was done using the gcc version of OpenMPI. There is also a version of Intel OpenMPI available. The setup of OpenMPI Intel is described here.


Example 5: Submitting a parallel job with mvapich1

The mvapich implementation of mpich version 1 is installed on the HPC cluster. This version seamlessly integrates the communication layers for the infiniband fabric for the users. It is one of the easiest implementations to use. However, it suffers in both robustness and ease of cleanup. The use of mpich version 2 or openmpi is strongly encouraged above this version.

To compile our test program with the gnu compiler, we run the following commands:

$ module load gnu-mvapich
$ mpicc -o trap-mpichv1 trap.c -lm

Using the following script, we can submit our job:

#!/bin/bash

#MOAB -l nodes=8
#MOAB -j oe
#MOAB -N TRAP-MPICHV1
#MOAB -l walltime=120:00

module load gnu-mvapich
mpirun $PBS_O_WORKDIR/trap-mpichv1

When save in the file mpichv1.sh, we can submit our program using

 msub mpichv1.sh 

Job Dependencies

One can indicate a dependency between two or more jobs by using the -W x=depend:dependency-type:jobid flag for the msub commando. For example to tell moab that job2.sh can only run after job1.sh successfully finishes, one can use:

$ msub job1.sh
12345

$ msub -W x=depend:afterok:12345 job2.sh
12346

This will queue job1.sh with a jobid of 12345, queue job2.sh with jobid 12346 and indicate moab that job2.sh has to wait until job1.sh has finished.

Dependency types

Dependency Format Description
after after:[:]... Job may start at any time after specified jobs have started execution.
afterany afterany:[:]... Job may start at any time after all specified jobs have completed regardless of completion status.
afterok afterok:[:]... Job may be start at any time after all specified jobs have successfully completed.
afternotok afternotok:[:]... Job may start at any time after any specified jobs have completed unsuccessfully.
before before:[:]... Job may start at any time before specified jobs have started execution.
beforeany beforeany:[:]... Job may start at any time before all specified jobs have completed regardless of completion status.
beforeok beforeok:[:]... Job may start at any time before all specified jobs have successfully completed.
beforenotok beforenotok:[:]... Job may start at any time before any specified jobs have completed unsuccessfully.
on on: Job may start after dependencies on other jobs have been satisfied.
synccount synccount: Job must start at the same time as other jobs that reference this job using the syncwith keyword.
syncwith syncwith: Job must start at the same time as .

MOAB Queues

There are a number of job queues available on the cluster and depending on your user type (owner-base or general access) you can submit your job to one of those. For an actual list of queues, see this list.

Time limitations

Jobs submitted to the backfill queue have a wallclock time restriction: their runtime can not exceed 4 hours. When that time limit is exceeded, your job will be cancelled. If you try to reserve a longer walltime than 4 hours, the scheduler will refuse your job.

Examples

Run a job in the backfill queue:

 $ msub -q backfill trap-mpichv2.sh 

Run a job with high-priority:

 $ msub -l qos=scs_high trap-mpichv2.sh 

Change the quality of service from a job that is waiting

$ checkjob 23831
...
Creds: user:paulvdm group:paulvdm class:scs_q qos:scs_high
...
$ setqos med 23831
$ checkjob 23831
...
Creds: user:paulvdm group:paulvdm class:scs_q qos:med
...

For more information, see the slides of from the tech series lectures.

Attachments:
FileFile size
Download this file (trap.c)A simple MPI test program (trap.c)2 Kb