Last week I determined the likely order of our Pseudomonas fluorescens R124 sequencing scaffolds by mapping them on to reference genomes from the same species. This mapping to reference genomes also indicated two of the sequence scaffolds ( 5 and 8 ) didn’t align (see this figure) and therefore may not be part of the genome assembly. The next logical step therefore was to find out what type of sequence these scaffolds represented.
A megablast search showed scaffold 5 did align to reference P fluorescens genomes which was surprising since, as I wrote above, scaffold 5 did not appear to part of the assembly. After a closer look however scaffold 5 is only ~5Kb in size while the scaffold map I produced was on a megabase scale. Therefore scaffold 5 was just too small to be seen by eye when compared to the other much large scaffolds.
The blast search using scaffold 8 returned a more interesting result. The best hit was a plasmid in Pseudomonas syringae pv. phaseolicola. The alignment between scaffold 8 and the plasmid is shown below (click for the larger version) where the plasmid open reading frames are shown in red and the aligned scaffold 8 regions are shown in blue.

This result indicates the likely reason that scaffold 8 does not align to any of the reference genomes is because it is plasmid in origin rather than genomic. A further blastx search with this scaffold identified four regions with sequence similarity to known proteins which are as follows: conjugal transfer proteins involved in the tranfer of genetic material, topoisomerases involved in unwinding DNA, and relaxases and replicases which are likely to be involved in plasmid replication. There was a fifth type of protein may be be related to Type IV (DNA or protein) secretion however the functional annotation of these was less clear. The blastx image result is shown below.

I’m still learning microbial genomics and I suspect it’s unsurprising to discover a plasmid containing sequence similarity to genes involved in replication and transfer. What does spark my interested is that the above blast image shows the rest of the plasmid does not appear in first 100 results returned by blast. This might indicate there is relatively novel data with low sequence similarity known genes waiting to be analysed.
UPDATE: Morgan Langille has rightly pointed out in a comment below that scaffold 8 could have low sequence similarity and still be part of the R124 genome if it’s an inserted genomic island.