April 29, 2008

Reflection on a year of (attempted) open notebook science

A year of work on the importance of amino acid biosynthetic cost has led to the submission of a manuscript, and a preprint available on Nature Preceedings. The openness in this project was inspired by reading Jean Claude Bradley’s and Cameron Neylon’s blogs about open notebook science. I already believed in the philosophy behind open source software, and I thought that any early feedback would be useful to my research. In addition to any input received, I thought that early sharing of my research would in turn be useful to contribute back to the community.

The platform I chose was a blog, allowing results to posted as I produce them. I was already familiar with blogging, and Wordpress makes creating and maintaining a blog simple. During the early stages of my project I found it quite useful to blog, as it helped me to clarify my results and ideas while the project was still taking shape. I tried to do this about once a week, on a Friday, and summarise my latest results. Having this record of results was also helpful to refer to when discussing my latest findings. When we were writing the manuscript I also found it useful to browse back through all the entries I had created and include any ideas I had forgotten about. However, as the project progressed blogging became less important, as I had already produced my main findings and was more focused on writing the manuscript.

As for sharing information I found that writing a summary blog my research takes rather a large amount of effort. Furthermore my  blog is the only gateway to my research, and results only become available when I make the time and effort write them up. This therefore doesn’t satisfy Jean Claude Bradley’s criteria of no insider knowledge, but rather could be described as being selectively open about my research. On the positive side a blog post is a concise summary that distills my most recent progress in a way I hope is easily accessible to a casual reader. Another interesting point is that posting all my results online meant they were indexed by Google, as you would expect, but this also lead to some strange occurrences when searching online for material. For example searching for “Akashi & Gojobori”, a paper I based my work on, brings up two links to my blog ahead of the original manuscript. I find this a bit embarrassing, and I wonder if the paper authors have also encountered this?

With less time to spend on blogging, I also tried to stream my research using Twitter, sending short messages automatically using a bash script every time I committed an SVN update. While this approach takes a lot less effort on my part, I think this is the opposite end of the spectrum to blogging, and spews out large amounts of obscure repository check in messages. Ultimately I think it is of little interest for even someone directly involved in the project.

I’m still interested in open notebook science, though my lack of posting might indicate otherwise. I’m going to continue trying out new methods of sharing bioinformatics research, and the start of a new research project gives me the chance to start afresh in these approaches. My main focus should be passive approaches that build into my work flow without too much effort, but also produce a meaningful summary of the research. Therefore in addition to a blog I think it is important to maintain a summary page of the research, otherwise it may be difficult for people to understand what the point of my research is when they first come across my blog. I think this is similar to the combined wiki and blog format used by Jean Claude Bradley. Having spent some time thinking about I how could implement this, I think a landing page should be readily auto generated from the results. In my head I’m thinking a Ruby on Rails type of approach, with a templating library such as HAML and a series of Rake tasks to regenerate the landing page with any new results, as well as send out a twitter update.

Finally I thought it might be interesting to adopt version numbers for the project, similar to those used in software development. The usual layout is something like 1.2.3. The last number would be used to track simple code edits. The second number would be used to show milestones in the overall project, for example each could correspond to a figure. The first number would then be the manuscript revision. Every time a new manuscript is prepared for submission, this could be updated, where the first manuscript preparation would have the number 1.0.0 Hopefully this type of numbering would make the project easier to track and interested parties could see if the research has been updated significantly since they last checked.

In summary, open notebook science has not really had a large positive effect on my research. I think that this is mainly because using a blog alone is not an effective method of communicating scientific progress, because it requires substantial effort on my part to update, and second tracking the current state of the research can be difficult. However, I still believe that the principles of open notebook science can be beneficial to my research. In the next couple of months I’ll try some new methods to see what does work.

February 1, 2008

A short essay on Open Notebook Science

As you might expect from the name, Open Notebook Science (ONS) has similarities with Open Source Software. The clearest likeness between the two, is the belief that by sharing and collaborating, more can be achieved than through secrecy and competition. An open approach to software development is proven to be successful: the greatest achievement is the development, and increasing adoption of the Linux operating system. On this foundation other applications like the Apache web server, MySQL database, and the PHP scripting language have been built, and the combination of the four is the engine running many websites, including this one. If ONS can enjoy a fraction of the success open software does, then science can only benefit.

ONS didn’t occur spontaneously, but is a step in the liberalisation of science by the freedom that the Web allows. An early example is the arXiv.org server started in 1991 as a repository for the physics community to share manuscripts prior to publication, 17 years later it now contains ~450,000 articles. Another, often overlooked, example of openness are the free biological databases such as EMBL and GenBank which allow unrestricted access to the genomes for all sequenced organisms. More recently, many journals are adopting open science policies, whereby all research is freely available upon publication, where previously the reader had to pay a fee. Now, increasingly research funding bodies are also stipulating, as a condition, that any articles resulting from the research are freely available at least 6 months after publication, examples being NIH and BBSRC.

When you work in science, many ideas come from reading papers, attending talks, and speaking to colleagues in the pub. So I think it’s fair to say that we will profit from further sharing on websites, such blogs and wikis, and the more everybody is open, the more the community benefits as a whole. Of course, being open creates questions on how scientists can still be recognised for their work, as well as how research can be commercialised. Most importantly, peer-review is still the best arbiter of research quality, and raw results must be viewed with this in mind.

One of the earliest adopters of complete openness is Jean-Claude Bradley, where his own and students’ laboratory notebooks are stored on a wiki, and freely available for anyone to read – updated as results are being produced. Jean-Claude also first discussed the term “Open Notebook” in relation to this, when he defined it as the researcher’s notebook being open to the world, that there is no insider knowledge. From Jean-Claude’s example, a small but growing number of researchers have followed: using blogs, wikis, and project management systems to make their research available. Examples of people using blogs to share research are Cameron Neylon and Rosie Redfield whose research groups use blogs either as the primary lab book or as a forum for describing and discussing results. In addition to Jean Claude Bradley, other projects using wikis for ONS are 1CellPK and Maldi. Pedro Beltrao and Jeremiah Faith use software management systems, where many tools useful for tracking software development, are applicable to bioinformatics research.

There are questions that this kind of openness generates. For instance, what do the journals think about publishing research that has already appeared on a blog? For most journals informally posting your research online is considered in the same light as giving a talk at a conference. A few exceptions exist though, such as Cell and Lancet, but on the whole publishers like NPG, BMC and PLoS are happy with kind of sharing, though it is always worth checking. Another question worth asking is what is your University’s policy towards intellectual property: does it belong to the researcher or the institution? Which leads into another point, in the last few years researchers have been increasingly expected to consider how their work can be commercialised, but any work disclosed on a blog or wiki cannot be patented, which should be borne in mind when you post new ideas or methods. Finally, there is common sense – how do your collaborators feel about early sharing of research? Or could the work you’re posting online be considered politically sensitive – involving animals or embryonic stem cells?

If after reading this, and looking at ONS researcher websites, you think that ONS can be useful to you, where do you begin? In my experience a blog is safe and easy place to start. You can discuss other people’s research, and if you feel confident you could begin to mention the results you’ve been producing. Then depending on how you feel, move towards making your notebook entirely open, using a wiki. Services like wordpress.com and blogger.com, offer easy blog creation for free, while wikispaces.com can be used to create a wiki. At the moment there is no single standard application for ONS, so a good idea is to experiment and see what suits you and your research.

So what is the future for Open Notebook Science? At present, proposals have been created for an ONS network, and a session at PSB. There is a small, but increasing number of scientists who are adopting open practices into their research, while a further few follow the mantra of “no insider information” and are completely open. Returning to my point at the start of this article, the creator of Linux said, when talking about open source software, “Many eyes make all problems shallow” and if Open Notebook Science can benefit from similar principles, this will be to the advantage of the individual as well as the community as a whole.