January 5, 2010

Genomics in a small microbiology lab

My post-doc is doing genomics of micro-organisms from starved cave environments. Several universities in the Kentucky area have banded together to get a sequencer which allows a small microbiology lab like ourselves to do sequencing for a few thousand dollars. The biology department here doesn’t have the dedicated computing cluster required for genomic assembly and analysis however the availability of on demand computing resources means this isn’t a problem as we can rent a virtual machine with 64GB of RAM by the hour. The only bottleneck in my project will therefore be my ability to formulate a research question and properly analyse The genomic data.

The availability of cheaper sequencing and by-the-hour computer time means that smaller research laboratories are no longer restricted in their ability to do genomics. It’s not hard to imagine a few years ago that sequencing costs put novel genomics out of reach for most labs, while only labs at large institutions had access to dedicated computing facilities. From my experience of moving from a large to small university it seems the financial and infrastructure barriers for doing genomics are now much lower. Genomics, in microbes at least, can now be carried out by hundreds of smaller labs instead of clustered at a few large sequencing centres and universities.

I remember when I started doing my masters five years ago that most papers began by discussing the "explosion of sequence data", but I think the availability of cheaper sequencing means that the explosion is just beginning. Now is a great time to be a bioinformatician – sequencing and computational power are now much easier to access and the problem will be finding people that can manage and process the data.

September 29, 2008

I wish there was Ruby on Rails for data

I’d like to think that learning Ruby on Rails has benefited my research. I’m certain that ActiveRecord has made it much, much easier to bridge the gap between my code and my database. I think validations make it easier for me to weed out bad data points in large data sets. I know for sure that RSpec has made it easy for me to test for every bug that I can think of in my code.

My nagging worry is that Rails was primarily designed for building web applications with a ‘nice’ sized dataset in the database. I can’t really say what a nice size dataset is, but I can guess that it is not 14 million rows. The difference in what Rails was designed for, and me using it for bioinformatics is highlighted when I need search for information about creating a certain type of spec, versus information about processing an ActiveRecord model over a cluster. I think that data processing, such statistics, analysis, and plotting, is where the gap lies between using Rails for its original purpose in building web applications, and subverting it to create a framework for a data centric project.