Monday, January 25, 2010
In short time so much data has been collected that what was before a largely empty map is now one of the most complete maps of Haiti. Michael Maron posted some before and after images on Flickr:
Before the earthquake (Jan 12th) :
and by Jan 14th:
Beyond just marking roads, there is also an analysis of damaged buildings and displacement camps. This rendering, taken from the OSM wiki, shows damaged buildings and refugee camps mapped within OpenStreetMap.
A few other geographical projects related to Haiti are described at crisiscommons.org .
Friday, January 22, 2010
All the code that has been written is available through the new modules labeled with the biojava3 name. Most work was related to the new sequence and protein structure modules:
The protein structure modules are the BioJava3-part which is closest to be released. During this week we have added the CE algorithm for protein structure alignment, implemented core interfaces for a generic Model View Control wrapping of various 3D visualization tools, we added better support for chemically modified residues (like MSE) and natural ones like Selenocysteine. They are treated now as amino acids. We also re-factored the code base to have the structure data model clearly separated from the new graphical user interfaces. This gui module now provides a nice way for calculating and visualizing protein structure alignments.
Next BioJava release (3.0)
There is still more work required to push the new sequence module to a state where it can be released. We also did not write any documentation this week, so that will have to be added later on. We will try to bring up the modules to a state where they can be released over the next weeks. Once a module is release ready a detailed summary of the new features will be posted to the mailing list. In any case there will be a BioJava 3.0 release in time for the ISMB/BOSC conference as we have been doing during the last years.
Thursday, January 21, 2010
Wednesday, January 20, 2010
About the generic design for Model View Control for 3D viewers, an unsolved problem is currently how to deal with selections. Selecting ranges, chains or atoms in proteins is done using a scripting interface at PyMol or Jmol. Shall we have a scripting interface (based on the syntax of one of these) or shall we have multiple select methods that accept various arguments? Jules Jacobsen wrapped the Jmol-Biojava interface using the new interface definitions for the MVC.
Tuesday, January 19, 2010
During the morning session we did a "Post Up", a silent and structured way of doing brainstorming. This was in order to come up with a new requirement how to do some state of the art pushing on the sequence modules. Scooter moderated a discussion where we focused on biologically meaningful representations of biological sequences. A Chemical Compound will be at the core of any sequence representation and we want to have different types of sequences like Chromosome sequence, Scaffold, DNA, RNA, Protein, and Sugars.
We started with test-driven development for the new sequence interfaces and then we will wrap the existing sequence code with the new interfaces. Here you can see us during the brainstorming session:
On the 3D structure side of things, we added a new 3D structure-gui module that is going to provide the Model View Control interface for the various open source viewers.
Monday, January 18, 2010
We had more discussion about how to deal with the sequence modules, bytecode dependencies of the core module and related topics. Seems there is a general agreement about moving the current sequence code out of the core module into its own space. Will continue tomorrow morning, when Richard Holland is back.
On a different side of things, Christoph Gille, Jules Jacobsen and I were discussing how to provide a Model View Control interface for using various open source 3D visualization libraries (Jmol, RCSB Libraries, Astex Viewer) together with Biojava.
We spent a lot of time discussing today, hope to be able to get more code done tomorrow.
I am going to blog every day about the BioJava Hackathon, so you can stay updated with what is happening here in Cambridge.
In the morning I gave this presentation around which we had several discussions about what are the most critical issues we want to solve. The issues are:
- Installation problems. Getting the latest checkout of the new Maven based build system causes problems for some of us. Sorting our the installation procedure is a major topic of the afternoon. It works successfully with the latest Eclipse, the m2eclipse plugin and subclipse plugin. Some of the NetBeans based developers also reported no problems during installations.
- Features. The Biojava features should become a first class citizen. This means it should be possible to instantiate them independently of sequence objects.
- Simplify Sequences: Sequences should be Strings as far as possible. Only convert them to Sequence objects if required.
- Some of the BioJava 3 docu is not up to date and can lead to misunderstandings. The latest BioJava 3 code is available in the trunk
- Memory efficiency: Make sure that iterating over RichSequences is memory efficient. (Fix a memory leak there)
- Bytecode: The Biojava - core module should not require the Bytecode module.
Saturday, January 16, 2010
Friday, January 15, 2010
Proteins can have various degrees of similarity. If two proteins show high similarity in their amino acid sequence, it is generally assumed that they are closely evolutionary related. With increasing evolutionary distance the degree of similarity usually drops, but proteins can still show similar function and have an overall similar 3D structure, even if the sequence similarity is low. The detection of such remote similarities is important in order to infer functional and evolutionary relationships between protein families and is a core technique used in structural bioinformatics.
For the RCSB-PDB web site I have recently been working on a new all against all comparison of all protein chains. While protein sequence comparisons can be computed quickly, the calculation of protein structure alignments is much more time consuming. So far we were computing about 140 mio. pairwise alignments in ~100.000 CPU hours on the Open Science Grid (OSG). With the help of Chris Bizon we could easily deploy our code there and I can highly recommend giving the OSG a try also for other scientists. A technical report about how we computed about 140 mio. pairwise alignments in ~100.000 CPU hours is available from here:
Sunday, January 10, 2010
In particular I spent a lot of time porting the CE and FATCAT algorithms from C to Java and developing a new user interface. Check out the latest version at http://betastaging.rcsb.org/pdb/workbench/workbench.do . (E.g. try to align 4HHB chain A and 4HHB chain B ).