I’ve spent the last five years as a computational scientist and my research begins with pulling data out of files. I’m far removed from the laboratories that generated the data in the first place. This past month however I’ve had to learn and decide about producing enough data to generate a complete genome. Prior to starting this post doc in November 2009 I assumed that second generation sequencing easily allowed small labs like us to obtain complete genome sequences. The reality however incurs problems I would have never considered.
For example: paired-end sequencing is useful for assembling sequence reads into a de novo scaffold because the distance between each pair of reads is known. However the extra effort required to prepare a paired-end sample results in an extra cost of a couple of thousand dollars. The expenses involved in research and how this affects the project outline are not something I have had to consider before because all I usually need is a computer and a desk.
Apart from just cost we also have to decide how many genomes we want to sequence and how to do this on a single 454 plate. One approach is to use a rubber gasket to divide the plate into 2, 4, 8, or 16 sections and allocate a single sample to each section. The downside of this approach is that the gasket covers the sequencing wells on the plate and therefore the more plate is divided the less the available sequencing capacity. The alternative to the rubber gasket is to label each sample with molecular barcodes however this will incur more costs because of the additional sample preparation.
When determining how many genomes to sequence we also had to consider the amount of sequence coverage for each genome. As we try to sequence more genomes there is less read depth for each individual genome and therefore each genome is harder to assemble. This is a constraint on our research aim of sequencing four Pseudomonas fluorescens isolates. This means our choice of research question is a balance of what we can theoretically achieve given the costs and amount of sequencing coverage available on a 454 plate.
Choices
I’m writing this as a from the the point of view of my initial surprise about the difficulties of planning sequencing project rather than to complain. The people who will do the sequencing for us been very helpful. Also it’s cheaper second generation sequencing that has made this research project possible.