| MPI-BLAST: parallel BLAST services |
|
LANL's MPI-BLAST is included in the Rocks Bio Roll where it is available on all HPC nodes. It is a parallel implementation, using the Message Passing Interface library, of the world's most popular sequence similarity search tool, and can be used to find similarity between protein or nucleotide sequence queries and sequence databases, including various translated searches. Using MPI-BLAST on HPCThe Rocks implementation of MPI-BLAST is described in the Bio Roll documentation and in more detail through the MPI-BLAST home page. It requires a partitioned database. Currently all databases in the default location (specified by the NCBI variable Shared) are partitioned except UniProt, HTC, and RS_RNA. You can also create custom partitioned databases in your own account by modifying your .ncbirc file to reflect the location of your own Shared database. There is no need to change the Local variable. Here's an example command to partition a FastA format database flatfile named human_pep in your local directory into 16 fragments, specifying that they are protein (-pT). The output will attempt to write to whatever path is specified by the Shared variable in your .ncbirc file. The number of partitions can be higher, but consensus is it reaches a point of diminishing return; 16 is a decent compromise for our system. source /usr/local/profile.d/openmpi-gnu.sh mpiformatdb --nfrags=16 -i human_pep -pT --quiet And to see the rest of mpiformat's (actually a formatdb wrapper) options: mpiformatdb --help After that submit your MPI-BLAST job with msub using some variation of the following MOAB script: #!/bin/bash # See for other options: # http://www.clusterresources.com/products/mwm/docs/commands/msub.shtml #MOAB -l nodes=64 #MOAB -j oe #MOAB -m abe #MOAB -N MPIBLASTRUN source /usr/local/profile.d/openmpi-gnu.sh source /usr/local/profile.d/bio.sh mpirun /opt/Bio/mpiblast/bin/mpiblast -d human_pep \ -i $HOME/test/ef1a_dicdi.fsa -o $HOME/test/ef1a_dicdi.mpiblast \ -p blastp --removedb Edit to reflect your desired number of nodes (#MOAB -l nodes= ) and to specify your database (-d) and input (-i) and output (-o) file names. You also need to name the appropriate BLAST program (-p) (blastp, blastn, blastx, tblastn, or tblastx) that you want to use, and the --removedb option deletes the temporary Local database on each node after computation. Please DO take advantage of the --removedb option so that local compute disks do not become filled with temporary BLAST data. Note: the continuation back-slash character (\) is not supported in this MOAB script - use one line only. |



