Command line BLAST!

Time has come to perform a BLAST from the command line, finally. This enable you using thousand if not billions of sequences as query, and to use a custom database as a reference.
Your goals are to set up a local BLAST and test it, and to prepare the command to BLAST all the metagenomics sequences produced by the Ion Proton against the proper database.

Overview

What does BLAST do? Even if all of you daily use the online NCBI BLAST to align sequences… take some time to review what sequence alignment is, and discuss it with your lab mates.

  • What is the relationship between sequence similarity and biological homology?
  • Consider a sequencing run of your own genome and a metagenomic sequencing. Which are the difference in term of sequence alignment?

Local BLAST

BLAST aligns a set of sequences called queries against a set of sequences composing the reference database. From our point of view both the query and the database are multi FASTA files.

formatdb

To make the database suitable for BLAST you have to format it with the “formatdb” tool provided with BLAST.

formatdb -p F -i Your_File.fa

the “-p” switch must be T (True) for protein databases, F (False) for nucleotidic ones. The “-i” switch is to specify the input file. The program automatically saves three files with the same name of the input file but with different file extension. Try to see them with “ls”.
NOTE: This program has not to be used with the query!

blastall

The real alignment is performed with the blastall program. Here a simplified syntax:

blastall -p program -i query_file -d database_file > output.txt

Where:

-p requires you to set the proper program (blastn, blastp, blastx, tblastn,…)
-i  to specify the query file (multi FASTA)
-d to specify the database file (multi FASTA but has to be formatted first)
-m to change the output. Use 8 for tabular format!
-e  to set the minimum e-value
-a  number of CPUs to use. With multicore processors you can speed up.

NOTE: When referring to the

BLASTing against 16S database

To save space on disk we have a common database (already formatted):

/home/geno/files/16S.12_12upgrade.new.seq

Try blasting your own file (the one downloaded with wget) against this database!
Try both with the default output (save it as blast_default.txt in your dir-01) and tabular output (save it as blast_tab.txt in your dir-01 directory).

 

 

One thought on “Command line BLAST!

Leave a Reply

Your email address will not be published. Required fields are marked *