Friday, January 22, 2010

BioJava Hackathon - Last Day

Today was the last day of the BioJava Hackathon. It has been an exciting week and we made progress along several lines, which I will talk about in a moment. Special thanks go to Jonathan Warren for organizing the meeting room at the Sanger Institute. Also thanks to our hackers without who this hackathon would not have been possible. In particular thanks to Scooter Willis, Jules Jacobsen, Andy Yates, Jonathan Warren, Christoph Gille, Matias Piipari for participating during the week and to our special guests who joined us for a day, Richard Holland and Jim Procter.

All the code that has been written is available through the new modules labeled with the biojava3 name. Most work was related to the new sequence and protein structure modules:

Sequence modules

There have been a lot of discussions about the current way sequences are represented over the last years. As such the "sequence guys" among the developers were working on coming up with a new design which is providing a biological meaningful (think central dogma) representation of sequences. What is still missing are file parsers using the new modules. The first fasta parser is about to be committed by Scooter as I am writing this. There is still more work required before the code will be ready for the next release. Still this is the beginning of a new data representation which should make the code base ready for the next couple of years.

Structure modules

The protein structure modules are the BioJava3-part which is closest to be released. During this week we have added the CE algorithm for protein structure alignment, implemented core interfaces for a generic Model View Control wrapping of various 3D visualization tools, we added better support for chemically modified residues (like MSE) and natural ones like Selenocysteine. They are treated now as amino acids. We also re-factored the code base to have the structure data model clearly separated from the new graphical user interfaces. This gui module now provides a nice way for calculating and visualizing protein structure alignments.

Next BioJava release (3.0)

There is still more work required to push the new sequence module to a state where it can be released. We also did not write any documentation this week, so that will have to be added later on. We will try to bring up the modules to a state where they can be released over the next weeks. Once a module is release ready a detailed summary of the new features will be posted to the mailing list. In any case there will be a BioJava 3.0 release in time for the ISMB/BOSC conference as we have been doing during the last years.