Dear all, if you wish to see the correct answer for last Friday’s test…
Metasolved Metagenomics Metatest
1. The file “appello.txt” is a list of the names of students of this course. Each line corresponds to one name.
(a) Write the shell commands to print this list on the video terminalcat appello.txt
(b) What is the command to count the number of names in the list?cat appello.txt | wc -l
2. A service provider just sent you the results of the sequencing run of your sample. The DNA that you sent was that of a bacterium that grows in cucumbers, and you wanted to sequence its whole genome. Given that the results were delivered as a FASTQ file, how can you determine the number of sequences contained in that file?
a) Write down the solution that you foundIn the FASTQ file each sequence is described by 4 lines:
- line 1 begins with a ‘@’ character and is followed by a sequence identifier
- line 2 is the sequence
- line 3 begins with a ‘+’ character
- line 4 encodes the quality values for the sequence in line 2
Therefore I will print to video terminale the file, I will count the lines and divide the result by 4.
b) Report the command linecat sequences.fastq | wc -l
3. The sequencing service sent you the FASTQ file containing the results of a sequencing run where they have loaded some amplicons that you submitted. The file contains 20,309,291 sequences. You want to count how many of them contain the primer that you have used for amplification (GTGCCAGCAGCCGCGGTAA). How can you do that using shell commands?cat sequences.fastq | grep ‘GTGCCAGCAGCCGCGGTAA’ | wc –l
4. You have the sequences of gDNA that we produced during the didactic laboratories.
(a) Which of the following blast commands is appropriate if you want to align your sequences against a 16S database?
(b) Which of the following blast commands is appropriate if you want to align your sequences against a database of protein sequences?
5. Assign to the following tasks the correct action. Type 1 for NCBI BLAST and 2 for BLAST local using command line:
(a) I want to align the sequence of a plasmid of unknown source 
(b) I want to align a sequence against the genome of the bacterium that I just sequenced 
(c) I want to align all the sequences described at question n3 against a database of 16S 
6. You have a file “sequenze_gdna.fasta” that contains all the reads of gDNA produced during the didactic laboratories. You want to align those reads against the database of bacterial genomes completely sequenced (“bacterial_genomes.fasta”).
(a) List the parameters that you have to pass to BLAST necessaryThe necessary parameters for blastall to work are: the type of alignment that I want to perform (e.g. blastp, blastn..); the input file or query; and the reference database.
(b) Write the command line to lunch blast and save the results in the file “aligned.txt”.
blastall –p blastn –i sequenze_gdna.fasta –d bacterial_genomes.fasta
7. Which of the following databases (from the list below) is appropriate in each of the following cases (more than one associations allowed):
[a] gDNA sequences produced during the didactic laboratories [5; 1]
[b] rRNA sequences produced during the didactic laboratories 
[c] the sequence of a protein known to be toxic to humans and found in some bacterial strains [2; 5]
[d] the sequences obtained after the amplification of the HLA locus of the students of this class 
 16S genes of known bacteria  cucumber bacteria whole genome  human chromosomes  NCBI nr database  all the available bacterial genomes
8. Look carefully at the data reported below and answer the questions:
|Results of the classification of the sequences obtained from the sequencing of gDNA from sample “S1685”||Results of the classification of the sequences obtained from the sequencing of 16S amplicons from sample “S1685”. Primers were selective for eubacteria|
Arcobacter butzleri 505
Arcobacter cryaerophilus 482
toluene-degrading bacterium UCR 021t 263
toluene-degrading bacterium UCR 021e 193
Salmonella enterica 150
Arcobacter sp. F79-6 107
Escherichia coli 106
Arcobacter butzleri 31,160
Arcobacter cryaerophilus 22,760
Escherichia coli 890
(a) Do you observe differences?
- I see more variety of organisms classified in the gDNA samples comparing to the 16S
- When a certain variety of organism has twice as many reads than another in the gDNA sample (e.g. Arcobacter 999; Arcobacter butzleri 505) in the 16S samples I find these same two organism varieties with a difference in abundance of 10-20 times (e.g. Arcobacter 686,640 ; Arcobacter butzleri 31,160)
- The absolute number of alignments per each group of organisms in higher in the 16S samples than in the gDNA
(b) Could you explain / advance hypothesis concerning the differences that you observe?
- The primers are specific for eubacteria, therefore “Thaumarcheota”, being Archea were not amplified. Other organisms, despite boing eubacteria, were not amplified efficiently either.
- The proportionality of the reads were altered by the PCR amplification.
9. You purified the total nucleic acids (gDNA and totalRNA) from an environmental sample. You want to perform PCRs on this sample and you want to know the exact amount of template DNA that you will put in each reaction tubes. How would you quantify the purified DNA to achieve this goal (chose between the methods that we discussed during the labs). Can you explain your choice?I would either use the qubit DNA kit, that quantifies the fluorescence emitted by a fluorophore that binds specifically to DNA and does not register the RNA signal, or the gel electrophoresis, where I can look specifically at the fluorescence of the DNA band and compare it to a band of known concentration.
10. You want to identify the bacteria that are responsible for gasification of organic waste into methane. You’re given the possibility to collect samples from a productive plant to perform your studies. In this plant, every Monday the bioreactors are filled with organic waste, are locked for anaerobiosys and their methane emission is measured for one week. The plant consists of tree bioreactors. In each of them the production of methane begins after a different period every time. This week bioreactor 3 started producing methane after 2 days, bioreactor 2 after 3 days and n 1 after 5 days. You aim at identifying the bacteria responsible for this fermentation in order to inoculate them every Monday together with the waste and boost the plant productivity by anticipating the methane emission at day 1 in all the bioreactors. When (which day) and where (which bioreactor) would you collect the samples to perform your studies? How would you store them if it is necessary to wait before running your metagenomic experiments?I would collect samples from all the 3 bioreactors at day 1, suddenly after inoculum of the waste. I would then collect one sample from each bioreactor suddenly after production of methane has started. With this experimental set-up I should be able to identify the species whose concentration increases specifically at the moment of methane production in comparison with the moment of waste inoculation. Comparison of the three samples would be crucial to obtain a statistically significant result, moreover it may rise some hints about the different time course of the three inocula (e.g. they might have different starting concentrations of the useful bacteria or of competitor bacteria that grow on the same substrate but do not emit methane). I would flash freeze the samples immediately after collection and keep them at -80°C until the moment of extraction. I would thaw them either directly in extraction buffer or in a buffer that protects nucleic acids from enzymatic degradation.