Day2: mastering (command line) alignments

Welcome to the second day of our bioinformatics practicals. Start as usual with the “appello Name Surname” command so that we do our automatic head count.
As a small update this morning the Proton run of our (meta)genomic sample finished giving us 5Gbp of data!

Which resources?

Your goal here is to be creative and think the (endless) possibilities you have to analyse the Ion Proton sequences.

Referring to the small preview of the sequencing results use any online tool you wish (take some time to Google for rRNA classification, metagenomics data analysis and so and forth) to annotate those sequences. Critical thinking is important, as there is no “right tool” but a correct use of tools. As we said yesterday, BLAST is powerful and certainly would fit… as long as we remember its limitations.
Create a “preview.txt” file in the “dir-01” directory annotating:

  1. The tools used (paste all the internet address you accessed)
  2. The results
  3. Your idea about those tools

When writing a report, consider that a person reading it should find it useful. 🙂

BLAST locally (think globally)

Your goal here is to be able to set up a BLAST search in your computer (creating a custom database).

Your first exercise is to perform a local BLAST (instructions). This small test is to learn using BLAST and interpreting the output…

  1. Create a “blast-demo” directory in your home folder.
  2. Now copy and paste the Proton sequences we published into a file called “reads.fasta”
  3. Execute the “formatdb” command against the user_XX.fa file you downloaded (and annotated) yesterday [should be in the “dir-01” directory. Keep it there!]
  4. Now considering that reads.fasta are your “query” and that the user_XX.fa will be considered your database: execute the blastn search and save it into a file called “blast_output.txt”.
  5. Repeat step 4 adding the “-m 8” parameter, this time saving the output as “1blast_output.tab”
  6. Read the two files you created [they should be both in the blast-demo directory]

You made big things! Megan will use BLAST output to help you analyze metagenomics samples! Of course we’ll have to make a make a much complex BLAST using all the Proton reads as query and a comprehensive 16S database as a reference.

PASS to another aligner

Now we’ll use a “next generation” aligner: one of the (many) programs designed to align millions (short) sequences in very small times.

BLAST is a popular aligner but the advent of NGS, and its huge amount of sequences to be aligned, made it too slow for practical uses. One of the faster aligner produced for NGS machines is PASS (Campagna et al. 2009) and has been written in our lab.

You can invoke it using the command “pass“, that will print a (rather long) list of options.
The basic usage is:

pass -fasta short.fasta -d reads.fasta -fid 90 -sam > output.sam

If your query sequence is in FASTQ format use -fastq instead of -fasta. The FASTQ format contains a quality score for each base, and it’s useful to trim low quality regions prior alignment (a step done by PASS itself). “-fid” requires a minimum percentage of identity, while “-sam” requires output in SAM format.

Create a “short.fasta” file with the following sequences:

>read1
GTTGTTGGGTCTTCACTGACTC
>read2
GCTCTGGTACTGAGATTGTTTC
>read3
AGACGCCAGTTTCTGTGGAGCCGCCCTAATGAAATAC

Try running pass with the same sequences you used in the previous exercise (so substitute short_reads.fastq with the FASTA file you made and reference.fasta with the user_XX.fa file). This time call the output “pass.sam”. Continue operating in the “blast-demo” directory. As you see PASS will print both the output and a series of information about the alignment. To save those information (called “log”) you can add this to the command 2> pass_log.txt. In general terms the “>” redirects the output of a program inside a file, while “2>” will redirect the log to a file. So let’s try again:

pass -fastq short_reads.fastq -d ref.fasta -fid 90 -sam > output1.sam 2> log1.txt

Now you learnt that with “cat” you can see the content of a file. When the file is so long you can use a more complex program, called “less“.

[youtube http://www.youtube.com/watch?v=TyhLIR-FkJQ]

Type:

less log1.txt

To see the content of that file, page by page. You can scroll with with arrows, skip page with the “space bar”. To skip to the begin/end of the file use “g”/”G” buttons. To exit type “q”.

High Performance Computing

When performing intensive tasks we need powerful servers. The HPC cluster at the CRIBI works with a “queue” master as described by your tutors. Let’s see how to use it.

The command you would run in your PC:

blastall -i sequences.fasta -d 16S.fasta -a 12 -m 8 > output.blast

How to put that command in queue:

sendcmd -c "blastall -i sequences.fasta -d 16S.fasta -m 8 > output.blast"

This commands send the command specified (with quotes!) with the -c parameter. To see if your command has been sent to the queue use the “qstat” command.

Try writing the command to perform a BLAST having the sequences stored in /home/geno/files/rrna.fastq and the reference in /home/geno/files/16S.fa. We want a tabular output saved in your home (called “rrna_blast.txt”).

When done send the command with sendcmd.

Note: don’t shut down your PC! Please choose “Termina sessione…” from the upper right menu (gear like icon)

One thought on “Day2: mastering (command line) alignments

Leave a Reply to Day3: A look at our alignments | Metagenomics Cancel reply

Your email address will not be published. Required fields are marked *