BLAST: Basic Local Alignment Search Tool

NCBI's BLAST is included in the Rocks Bio Roll  and in the GCG package on all HPC nodes.  It is the world's most popular sequence similarity search tool, and can be used to find similarity between protein or nucleotide sequence queries and sequence databases, including various translated searches.

Using BLAST on HPC

The Rocks implementation of BLAST is described in the the Bio Roll documentation and through the BLAST --help command:

blastall --help

The same BLAST databases are avaliable to the Rocks and GCG versions of BLAST:

    GenBank (genbank) and (separately)
    Human Expressed Sequence Tags (est_human)
    Mouse Expressed Sequence Tags (est_mouse)
    Other Expressed Sequence Tags (est_other)
    High Throughput Genomic sequences (htg)
    High Throughput cDNA sequences (htc)
    Genome Survey Sequence (gss)
    GenPept (genpept)
    RefSeq RNA and Protein (rs_rna and rs_protein)
    UniProt (uniprot)

The environment variable $BLASTDB is set for you to point to these databases when you run the bio.sh or bio.csh initialization script. Therefore, launch the program with some variation of the following type of command (preferably within the context of the MOAB job submission mechanism):

blastall -d uniprot -p blastp -i test.pep.fsa -o test.pep.blastp

Here's a sample MOAB submission script showing how it could be done with tBLASTn to search a protein query against the Genome Survey Sequence (gss) section of GenBank:

#!/bin/bash
# See for other MOAB msub options:
# http://www.clusterresources.com/
#MOAB -j oe
#MOAB -m abe
#MOAB -N tBLASTn
source /usr/local/profile.d/bio.sh
blastall -d gss -p tblastn -i $HOME/test/test.pep.fsa \
 -o $HOME/test/test.pep.tblastn

Note: the backslash (\) continuation character does not work within MOAB scripts and is just used here as a convenience. The command must be all on one line. Consider using BLAST inside of GCG for ease of use, unless you need multiple nodes and/or scripting capability.