| Using the MOAB Workload Manager - MOAB MPICH2 Job |
|
Page 6 of 10
Example 3: Submitting a parallel job with mvapich2To compile the trapezoid program using the Intel compiler, we first have to make sure it is in our path by executing: $ module load intel We can check if we have the right compiler by running which mpicc and mpicc -v. To compile our program we run: $ mpicc -o trap-mpichv2 trap.c -lm Note: The above command may produce the message: "warning: feupdateenv is not implemented and will always fail." This is normal and no cause for concern unless you know for certain that your compilation requires the use of this function. To submit the program to a batch system, you must create a startup script with the appropriate topology. Here is an example that requests 8 nodes, thus a total of 8 processes. The host file that mvampich2 uses must have the number of processors to use as host:N, where N is the number of processes. The mpdboot processes are started at one per node and the mpirun should start jobs at N per node. The argument to mpdboot must match the number of nodes and the argument to mpirun must match the number of nodes times the number of processes per node. 1 #!/bin/bash This script first sets up the right environment for the mpichv2 paths for the Intel compiler and then the mpirun program starts the executable trap-mpich2 on 8 nodes. In this example we have set the walltime to 1 hour (60 minutes). Although it's not imperative to set the walltime property, it does make it easier for the scheduler to schedule your job. Be sure not under-estimate the walltime, but over-estimate a bit. If the file is saved as trap-mpichv2.sh, the job can be executed with: $ msub trap-mpichv2.sh The above script makes the moab job submission very flexible. The number of processes that you request will be obtained not by the topology requested but by the nodes that have free cpus. For example, a 128 node job may end up working on 64 nodes with 2 free CPUs or 128 nodes with 1 free CPU or some linear combination of this. This example script can be found on the cluster in /panfs/storage.local/system/tutorial/example2/trap-mpichv2.sh. |



Quickstart