RepMaker: program to run the PAUP bootstrap in parallel

RepMaker is a perl script written by Jim Wilgenbusch that automatically generates multiple NEXUS files, usually used to distribute a single PAUP* bootstrap analysis over multiple cpus.

Using RepMaker on HPC

Two version of the repmaker command are available on HPC; repmaker_moab is available for use with the MOAB job manager and the other repmaker_condor is available for use with the condor job manager. Both RepMaker commands are used interactively to create the necessary PAUP* bootstrap files. You will need a minimum of two files; a data file and an input file. The input file should be named input.nex and contain a PAUP block with model or run parameters. See the RepMaker web page for details. The following is a quick-start guide showing how to use RepMaker on the HPC.

You must initialize the bioinformatics suite to use this software.

RepMaker with MOAB

MOAB doesn't allow job arrays, so you will have to use an additional script utility to submit the bootstrap analyses on the HPC after running the repmaker_moab script. Here's a sample workflow for running this type of analysis on HPC under the MOAB scheduler:

  1. Create your input.nex file in the same directory as your data file, both in NEXUS format.
  2. Run the command: "repmaker_moab yourdata.nex"
  3. Change dirs: "cd RepFiles"
  4. Run"submitAll"

Before submitting your job make sure that each bootstrap replicate will complete in less than four hours. Bootstrap replicates are automatically sent the HPC backfill queue, which can only run jobs for a maximum of four hours. If a replicate attempts to run from more then four hours it will be killed and you will have lost that replicate and the four hours used to run it. If each replicate takes longer than four hour to complete consider submitting your bootstrap analysis to the condor queue.

RepMaker with Condor

Condor excels at executing large embarrassingly parallel job arrays, like phylogenetic non-parametric bootstrap analyses. Since the HPC communication fabric (Infiniband) is especially well suited for tightly coupled parallelism via MPI, it is recommended that you submit your PAUP* bootstrap jobs to the condor job queue. Here's a sample workflow for running this type of analysis on HPC under the condor job scheduler:

  1. Create your input.nex file in the same directory as your data file, both in NEXUS format.
  2. Run the command: "repmaker_condor yourdata.nex"
  3. Change dirs: "cd RepFiles"
  4. Run"condor_submit repMaker.cmd"

RepMaker related utilities

After each of the independent bootstrap replicates have completed you should be left with a tree file for each replicate. Occasionally the tree file for a given replicate will not be transferred back to the head node from where the job was submitted. To check if all the necessary replicate files have successfully returned to the submit node try using the "rmcheck" script. The "rmcheck" command requires one argument, the prefix common to all bootstrap replicate tree files. For example:
  • Run the command: "rmcheck rep_bstrees"
Where "rep_bstrees" is prefix to each tree file saved by the bootstrap run.