December 22, 2009

Deciding what genomes to sequence



We’re aiming to sequence three genomes from three different cave environments, where each cave differs in the degree of nutrient starvation. We will sequence P. fluorescens isolates from each cave and examine how the genomes have adapted to the starved cave environments compared with the available genomes of P. fluorescens from soil or plants.

We’ll be using Roche/454 sequencing which provide ~800,000 reads of genomic DNA where each read is approximately 500 nucleotides in length. In total a single 454 plate should provide 400Mbp of sequence data. Previous sequencing of P. fluorescens has the shown the genome size is just under 7Mbp and therefore if we sequenced a single P. fluorescens genome this would generate 400Mbp / 7 Mbp = 57X coverage. We would however like to sequence multiple genomes and there are two options for this.

Rubber grid

A rubber grid can be placed over the sequencing plate to divide it into individual segments, where each segment can be used to sequence a genome. The rubber template however covers approximately one third of the sequencing plate and will reduce the amount of reads from 400Mbp to 266Mbp. Therefore if we sequenced four P. fluorescens genomes using the rubber template to divide the plate this would theoretically provide 10X coverage for each genome.

Sequence tags

The second approach to sequencing multiple genomes involves using sequencing tags. Each DNA fragment has a small oligonucleotide tag attached, where each tag is unique to one of the P. fluorescens isolates. As the fragment is sequenced the tag is also sequenced, and this allows the sequenced DNA to be attributed to a source genome based on the attached tag. There is therefore no need to use a rubber gasket and the full 400Mbp of sequence data can be produced from the plate. This could theoretically provide 14X coverage for 4 genomes, or 11X coverage for 5 genomes. This therefore may seem like the obvious choice, but the sequencing facility tried using sequencing tags before and therefore there is a risk in trying something for the first time.

Making a decision

Over the next few days we have to decide the aim of this project, especially since my funding is only for one year. One option is to sequence one genome from each cave and then compare the cave genomes with existing P. fluorescens genomes. This would give some indication of how the genomes differ between caves and how they are adapted to caves.

A second approach could be to sequence multiple genomes from two caves. This would allow not only examination of how each cave has shaped the genome but also how the variable the genome is between the same species in the same environment.

  • I like both possibilities. I would probably based a decision a little bit also on the extent of differences between caves. They both seam "safe" but the second sounds a bit more complete since you would be able to compare the variations within each cave with the differences across caves. This would strengthen the evolutionary analysis.
  • Hi Pedro,

    Thanks for your comment. It's nice to receive feedback on my ideas now that I'm the only bioinformatician in my department. I'd like to do the second between cave analysis but I worry that we won't have enough sequence data from 454 for four genomes. The P. fluorescens genomes are quite large so it's possible that we won't get enough coverage. I guess we don't need a complete assembly of the genome just enough to find most of the gene complete in each isolate.
  • Mike
    Hi Yannik, thanks for the suggestion. Illumina would provide more reads, but the Kentucky Ecological Initiative has a 454 sequencer which is what we have access to.
  • Why not use an Illumina machine?
    Illumina chips are already divided up into eight lanes (for seven samples + one control), so no need for the tagging step which can introduce biases...

    And bacterial genome assembly from 50/75bp Illumina data is becoming mainstream...
blog comments powered by Disqus