We have received the 454 results for our two samples thanks to the University of Kentucky AGCT sequencing centre. The genomes sequenced were P. fluorescens isolates, R124 and KY485, cultured from two separate caves sites. The relationships of these cave strains to other P.fluorescens strains is shown below in the phylogenetic tree constructed from a 16S ribosomal gene alignment. This tree highlights the relationship of the cave strains we are sequencing to those already sequenced or being sequenced.

The current genomic scaffolds are available on github for both R124 and KY485 strains. I’m going to update the repositories as gaps are closed and the genomes annotated. So far the initial results of the sequencing show the genomes of both isolates are larger than we expected. The predicted genome size and coverage from the Roche GS De Novo Assembler (newbler) run for each strain is illustrated in the chart below (See here for the R code and data). The figure includes the genome sizes of already sequenced P. fluorescens isolates as references.

Genome size
The graph shows both P. fluorescens genomes appear larger than those of existing genomes. The R124 strain is predicted to be marginally larger by ~0.3 MBp than the largest already sequenced P. fluorescens genome while the KY485 strain is much larger by >4 MBp. The sequence data however is relatively fresh and therefore we expect the estimated genome size will change as we try to generate a complete build. Furthermore I believe there is the possibility the current data contains sequences from plasmids which would inflate the size estimates.
Sequencing coverage
The unexpected large size of each genome resulted in less coverage than we hoped. The total genomic coverage in scaffolds is highlighted by the darker grey bars in the barchart above. The R124 assembly has a reasonable ~85% of the predicted genome at 22X coverage. However we have only ~44% of the KY485 genome at 17X coverage – less than half the genome. This therefore indicates a large portion of the KY485 genome is still unknown.
Next step
Over the next weeks we will be trying to bridge gaps in the smaller of the two genomes using PCR and traditional sequencing. I’ll also be trying to estimate size of the gaps in each genome assembly using other P. fluorescens genomes as a reference. I’ll also try to determine if any differences genomic GC content suggest the presence of plasmids in the sequencing data.