A year of work on the importance of amino acid biosynthetic cost has led to the submission of a manuscript, and a preprint available on Nature Preceedings. The openness in this project was inspired by reading Jean Claude Bradley’s and Cameron Neylon’s blogs about open notebook science. I already believed in the philosophy behind open source software, and I thought that any early feedback would be useful to my research. In addition to any input received, I thought that early sharing of my research would in turn be useful to contribute back to the community.
The platform I chose was a blog, allowing results to posted as I produce them. I was already familiar with blogging, and Wordpress makes creating and maintaining a blog simple. During the early stages of my project I found it quite useful to blog, as it helped me to clarify my results and ideas while the project was still taking shape. I tried to do this about once a week, on a Friday, and summarise my latest results. Having this record of results was also helpful to refer to when discussing my latest findings. When we were writing the manuscript I also found it useful to browse back through all the entries I had created and include any ideas I had forgotten about. However, as the project progressed blogging became less important, as I had already produced my main findings and was more focused on writing the manuscript.
As for sharing information I found that writing a summary blog my research takes rather a large amount of effort. Furthermore my blog is the only gateway to my research, and results only become available when I make the time and effort write them up. This therefore doesn’t satisfy Jean Claude Bradley’s criteria of no insider knowledge, but rather could be described as being selectively open about my research. On the positive side a blog post is a concise summary that distills my most recent progress in a way I hope is easily accessible to a casual reader. Another interesting point is that posting all my results online meant they were indexed by Google, as you would expect, but this also lead to some strange occurrences when searching online for material. For example searching for “Akashi & Gojobori”, a paper I based my work on, brings up two links to my blog ahead of the original manuscript. I find this a bit embarrassing, and I wonder if the paper authors have also encountered this?
With less time to spend on blogging, I also tried to stream my research using Twitter, sending short messages automatically using a bash script every time I committed an SVN update. While this approach takes a lot less effort on my part, I think this is the opposite end of the spectrum to blogging, and spews out large amounts of obscure repository check in messages. Ultimately I think it is of little interest for even someone directly involved in the project.
I’m still interested in open notebook science, though my lack of posting might indicate otherwise. I’m going to continue trying out new methods of sharing bioinformatics research, and the start of a new research project gives me the chance to start afresh in these approaches. My main focus should be passive approaches that build into my work flow without too much effort, but also produce a meaningful summary of the research. Therefore in addition to a blog I think it is important to maintain a summary page of the research, otherwise it may be difficult for people to understand what the point of my research is when they first come across my blog. I think this is similar to the combined wiki and blog format used by Jean Claude Bradley. Having spent some time thinking about I how could implement this, I think a landing page should be readily auto generated from the results. In my head I’m thinking a Ruby on Rails type of approach, with a templating library such as HAML and a series of Rake tasks to regenerate the landing page with any new results, as well as send out a twitter update.
Finally I thought it might be interesting to adopt version numbers for the project, similar to those used in software development. The usual layout is something like 1.2.3. The last number would be used to track simple code edits. The second number would be used to show milestones in the overall project, for example each could correspond to a figure. The first number would then be the manuscript revision. Every time a new manuscript is prepared for submission, this could be updated, where the first manuscript preparation would have the number 1.0.0 Hopefully this type of numbering would make the project easier to track and interested parties could see if the research has been updated significantly since they last checked.
In summary, open notebook science has not really had a large positive effect on my research. I think that this is mainly because using a blog alone is not an effective method of communicating scientific progress, because it requires substantial effort on my part to update, and second tracking the current state of the research can be difficult. However, I still believe that the principles of open notebook science can be beneficial to my research. In the next couple of months I’ll try some new methods to see what does work.