March 15, 2010

Discovering a plasmid in our sequence data

Last week I determined the likely order of our Pseudomonas fluorescens R124 sequencing scaffolds by mapping them on to reference genomes from the same species. This mapping to reference genomes also indicated two of the sequence scaffolds ( 5 and 8 ) didn’t align (see this figure) and therefore may not be part of the genome assembly. The next logical step therefore was to find out what type of sequence these scaffolds represented.

A megablast search showed scaffold 5 did align to reference P fluorescens genomes which was surprising since, as I wrote above, scaffold 5 did not appear to part of the assembly. After a closer look however scaffold 5 is only ~5Kb in size while the scaffold map I produced was on a megabase scale. Therefore scaffold 5 was just too small to be seen by eye when compared to the other much large scaffolds.

The blast search using scaffold 8 returned a more interesting result. The best hit was a plasmid in Pseudomonas syringae pv. phaseolicola. The alignment between scaffold 8 and the plasmid is shown below (click for the larger version) where the plasmid open reading frames are shown in red and the aligned scaffold 8 regions are shown in blue.

This result indicates the likely reason that scaffold 8 does not align to any of the reference genomes is because it is plasmid in origin rather than genomic. A further blastx search with this scaffold identified four regions with sequence similarity to known proteins which are as follows: conjugal transfer proteins involved in the tranfer of genetic material, topoisomerases involved in unwinding DNA, and relaxases and replicases which are likely to be involved in plasmid replication. There was a fifth type of protein may be be related to Type IV (DNA or protein) secretion however the functional annotation of these was less clear. The blastx image result is shown below.

I’m still learning microbial genomics and I suspect it’s unsurprising to discover a plasmid containing sequence similarity to genes involved in replication and transfer. What does spark my interested is that the above blast image shows the rest of the plasmid does not appear in first 100 results returned by blast. This might indicate there is relatively novel data with low sequence similarity known genes waiting to be analysed.

UPDATE: Morgan Langille has rightly pointed out in a comment below that scaffold 8 could have low sequence similarity and still be part of the R124 genome if it’s an inserted genomic island.

  • I'm not really sure if your analysis confirms that it is a plamid. It could just as easily be a newly inserted region (maybe with a plamid intermediate) into the genome. Many of these large HGT events (or genomic islands (GIs)) will have these low sequence similarity scores. Pseudomonas is also well known for containing GIs.Good luck with your analysis!

  • I've added an update the post - it's good to get feedback and discussion.

  • Thanks Morgan. You're right we won't know which is the case until we can either circularise the scaffold to produce a complete plasmid or bridge the gaps with the other scaffolds and integrate it into the genome build. We're planning to do this soon but at the moment though we're planning on crossing the smaller internal scaffold gaps.

blog comments powered by Disqus