|FastA: A highly sensitive sequence comparison suite|
Pearson's FastA package is included in the Rocks Bio Roll where it is available on all HPC nodes, and in the GCG package where it is only available on the General Access submit head node. It is a suite of about a dozen sequence similarity comparison tools, and can be used to find similarity between protein or nucleotide sequence queries and sequence databases, including various translated searches. It can be more sensitive than BLAST, especially when one is forced to perform the type of DNA based searches without translation inherent to non-coding genomic analysis.
Using FastA on HPC
Login $PATH is set to all of the FastA programs by activating the Bio Roll, and GCG sequence databases can be specified with the proper command line syntax, so there is seldom any need to build local user specific databases, e.g. to search all of UniProt using a protein query:
fasta35 test.fsa.pep \ @/panfs/storage.local/bio/gcgdata/fastadb/uniprot.fastadb
Note the "@" sign to specify the desired library file. Look in:
to see all of the available GCG library files. You may want to create a FASTLIBS file and environment variable to make library specification easier. See the fasta3x.doc file for explanation. Consider running the FastA programs from GCG's SeqLab GUI for ease of use, unless you need multi-node and/or scripting capability. Furthermore, if you are interested in multi-node capabilities see the MPI version of the FastA programs under mpiFastA.