|HMMER: Hidden Markov Models for sequence analysis|
HMMER comes as a part of the Rocks Bio Roll, and is available on all HPC nodes. It is a suite of programs that uses Hidden Markov Models (HMMs) to describe the profile of a multiple sequence alignment. This profile can in turn be used to perform incredibly sensitive database searches and to merge evermore distantly related sequences into the original alignment. You can also search your protein sequence against a profile library to detect known domains. HMMER is also included in the GCG package, and it may be more intuitive to run within that context.
Using HMMER on HPC
The package consists of nine programs: hmmalign, hmmbuild, hmmcalibrate, hmmemit, hmmfetch, hmmindex, hmmpfam, hmmsearch.
Constitutive programs all support the "-h" help flag, e.g:
Two PFam databases are provided in hmmpfam's default $HMMERDB location: Pfam.hmmdb with global HMM profiles and PfamFrag.hmmdb with local HMM profiles. Two FASTA format sequence library files for hmmsearch are located in the default $BLASTDB directory: uniprot.fsa and rs_prot.fsa.
These two HMMER programs, hmmpfam and hmmsearch, are particularly CPU intensive. We have provided MPI versions of these two programs, launched with the command mpihmmpfam and mpihmmsearch, respectively. They were compiled with GNU OpenMPI so that environment needs to be sourced in your submission script. Here's a sample MOAB script for mpihmmpfam:
#!/bin/bash #MOAB -l nodes=25 #MOAB -j oe #MOAB -m abe #MOAB -N HMMPFAM-OPENMPI source /usr/local/profile.d/openmpi-gnu.sh mpirun /opt/Bio/hmmer/bin/mpihmmpfam \ $HMMERDB/Pfam.hmmdb $HOME/test/protein.fsa
And here's one for a mpihmmsearch of the GCG BLAST UniProt database:
#!/bin/bash #MOAB -l nodes=25 #MOAB -j oe #MOAB -m abe #MOAB -N HMMSEARCH-OPENMPI source /usr/local/profile.d/openmpi-gnu.sh mpirun /opt/Bio/hmmer/bin/mpihmmsearch -E 1.0 \ $HOME/test/protein.hmm $BLASTDB/uniprot.fsa
The backslash, "\," continuation character does not work in MOAB scripts, and is just used here to denote that all of mpirun command needs to be on a single line. Consider running HMMER programs through GCG for its ease of use, unless you need to use it on multiple nodes and/or through scripts.