Tuesday, December 28, 2010

BioJava 3.0 released

Today we released BioJava 3.0. It is available from http://biojava.org/wiki/BioJava:Download.
Over the last year BioJava has undergone a major re-write. It has been modularized into small, re-usable components and a number of new features have been added. The new approach, modeled after the apache commons, minimizes dependencies and allows for easier contribution of new components.

At the present the main modules are:
biojava3-core: The core module offers the basic tools required for working with biological sequences of various types (DNA, RNA, protein). Besides file parsers for popular file formats it provides efficient data structures for sequence manipulation and serialization.
biojava3-genome: The genome module provides support for reading and writing of gtf, gff2, gff3 file formats
biojava3-alignment: This module provides implementations for pairwise and multiple sequence alignments (MSA). The implementation for MSA provides a flexible and multi-threaded framework that works in linear space and that, as an option, allows the users to define anchors that are used in the build up of the multiple alignment.
biojava3-structure: The 3D protein structure module provides parsers and a data model for working PDB and mmCif files. New features in this release are the implementation of the CE and FATCAT structural alignment algorithms and the support of chemical component definition files, for a chemically and biologically correct representation of modified residues and ligands.
biojava3-protmod: The protein modification module can detect more than 200 protein modifications and crosslinks in 3D protein structures. It comes with an XML file and Java data structures to store information about different types of protein modifications collected from PDB, RESID, and PSI-MOD.
Not every feature of the BioJava 1.X code base was migrated over to BioJava 3.0. A modularized version of the 1.X sources is available as a new "biojava-legacy" project.

Monday, December 13, 2010

New Data Drilldown options

One of my favorite features that was recently added to the RCSB PDB site is the drilldown of search results. In this release an extension to this was added. It is now possible to drill down through EC numbers as well as through the SCOP classification.

A useful trick, that some people might not have noticed yet is that the drilldown is available for the whole set of PDB entries. By clicking on the total number of entries on top of the page one can access this faceted browsing interface over the whole database. The screenshot below shows where you have to click to access this feature.

After clicking on the total number of entries here is the drilldown for the whole of PDB:

Saturday, December 11, 2010

Personal Structure Annotations at RCSB PDB

The latest RCSB PDB release provides the possibility of attaching personal annotations to PDB entries. If you have been using the iPhone application, you might have noticed that this feature has been introduced  there already a few weeks ago. Now you can also annotate your favorite proteins directly at the RCSB PDB website.

How does this work? If you are not logged into myPDB, and you view the details of an entry on the Structure Summary page you will see something like this:

Before you can create an annotation you need to get a MyPDB account and log in. This is possible on the left-hand menu in the MyPDB box:

After logging in you can tag and annotate every entry:

I am using this tool to keep comments and notes on various PDB entries. For the future it would be nice to be able to share those notes with some of my friends or students. Another nice feature for the future would be to be able to attach "positional" features in order to e.g. annotate active sites or domain boundaries.

Tuesday, November 16, 2010

Bioinformatics manuscript update

Today Bioinformatics released our paper: Pre-calculated protein structure alignments at the RCSB PDB website. However something has gone wrong and the paper requires registration for access. We requested the open access publishing license when the paper got accepted. I just contacted the editors. Until this is sorted out, you can access the final version of the manuscript for free through these links:

UPDATE: by now the article is correctly available as open access.



Full Text:




Monday, November 1, 2010

NAR database issue 2011, new RCSB PDB paper:

Our next paper has become public available: 

The RCSB Protein Data Bank: redesigned web site and web services

in which we describe our developments at the RCSB PDB web site during the last two years.

 http://nar.oxfordjournals.org/content/early/2010/10/28/nar.gkq1021.full.html? ijkey=fKC45N3RRezzpj9&keytype=ref 

Hope you find it interesting!

Saturday, October 23, 2010

At the Google Summer of Code Mentor Summit

This weekend I am spending at the Google Summer of Code Mentor Summit. It is a great event hosted at the Google Campus in Mountain View. Below more info about the sessions I am attending.


Overview of all sessions

Student retention

Notes at http://openetherpad.org/student-retention

Measuring Usability

Criteria for usability:

Notes at http://etherpad.osuosl.org/gsoc2010usability
  1.  Abstraction Level
  2.  Closeness of mapping
  3. Consistency
  4. Diffuseness/terseness
  5. Eror-Proneness
  6. Hard mental operations
  7. Hidden dependencies
  8. Junxtaposability
  9. Premature Commitment
  10. Progressive Evaluation
  11. Role expressiveness
  12. Secondary Notation
  13. Viscosity
  14. Visibility
How to make a team use agile development, if they have never done it before
Notes at: http://typewith.me/agile

Liberate your data
Notes at: http://typewith.me/liberate-data

Sunday ...

Open Streetmap 
using Cherokee for quick OSM rendering http://code.google.com/p/cherokee/

Advanced Trolling
(This seems to be the most popular session so far ;-)
Notes at: http://etherpad.osuosl.org/advanced-trolling
the CRAPL license http://matt.might.net/articles/crapl/

Open Source Science

Jim Procter and I organized the Open Source Science session:

Monday, October 11, 2010

New Paper: Precalculated Protein Structure Alignments at the RCSB PDB website

Bioinformatics just made our latest paper available as an early preview:
Precalculated Protein Structure Alignments at the RCSB-PDB website

Summary: With the continuous growth of the RCSB Protein Data Bank (PDB), Berman et al. (2000), providing an up-to-date systematic structure comparison of all protein structures poses an ever growing challenge. Here we present a comparison tool for calculating both 1D protein sequence and 3D protein structure alignments. This tool supports various applications at the RCSB PDB website. First, a structure alignment web service calculates pairwise alignments. Second, a stand-alone application runs alignments locally and visualizes the results. Third, pre-calculated 3D structure comparisons for the whole PDB are provided and updated on a weekly basis. These three applications allow users to discover novel relationships between proteins available either at the RCSB PDB or provided by the user.

Availability and Implementation: A web user interface is available at
http://www.rcsb.org/pdb/workbench/workbench.do. The source code
is available under the LGPL license from http://www.biojava.org.
A source bundle, prepared for local execution, is available from

UPDATE: the link below should provide free access:
Read the full paper here

Improved Reporting Features at the RCSB PDB site

One of the features at the RCSB PDB site that many people are not aware of, is the powerful tabular reporting tool. Any search result can be use to generate one of several reports. (e.g. Image Collages,  pre-defined reports, fully customizable tables, export to Excel, etc. see screenshot below).

In this release Chuxiao added better reporting for Ligands. There are also plenty of new options for the fully customizable reports, based on feedback we have received from our users.

Friday, October 8, 2010

BioJava's Google Summer of Code summary

Today a slighlty belated summary of what happened at the Google Summer of Code at the BioJava project:

Our two students Mark Chapman and Jianjiong Gao did an amazing job on their two projects "All Java Multiple Sequence Alignment" (MSA) and "Identification and Classification of Posttranslational Modification of Proteins" (PTM).

For Multiple Sequence Alignments we now have a flexible and multi-threaded MSA implementation that works in linear space and that, as an option, allows the users to define anchors that are used in the build up of the multiple alignment. The code is available as part of the new biojava3-alignment module.

The Posttranslational Modification module (biojava3-protmod) can detect three different types of protein modifications in protein structures. It comes with an XML file & Java data structures to store information about different types of protein modifications, and contains entries from RESID, PDBCC and PSI-MOD. There is also a visualisation component to display cross linked PTM on a sequence viewer.

Both Mark and Jianjiong have expressed their interest in maintaining and further developing their modules and I am looking forward to interacting more with them in the future. I want to thank the Mentors and Co-Mentors Peter Rose, Kyle Ellrott and Scooter Willis for their help and guidance for the projects, without them this would not have been possible. Thanks also to Robert Buels and the Open Bioinformatics Foundation for organizing our applications for GSoC and last, but not least, Google for sponsoring this Summer of Code.

Thursday, October 7, 2010

New iPhone app at RCSB PDB (beta)

The latest RCSB PDB release features a first version of an iPhone application. It is provided as a HTML5-based application, which means you can install it without going to the Apple Store. Simply point your iPhone Safari-browser to http://www.pdb.org and click "yes" a couple of times. Best to do this while you are on a wireless connection, since the application installs some data for quicker data access.

Gregg, the author of this application also made a screencast with the installation instructions. You can watch it here:

Wednesday, October 6, 2010

New RCSB PDB Feature: Faceted Browsing

One of the features I find most exciting at the latest RCSB PDB web site release is "faceted browsing". Similar to an online shopping site, which allows to drill down through product categories, it is now possible to drill down through lists of protein structures using categories like Resolution, Organism, Polymer Type, to name just a few of them.

You can easily start browsing by clicking the total number of structures on top of every page. Since this features has become available (Thanks Dimitris!) I have observed myself to use it all the time and I perform much fewer "advanced queries", because this new feature is so easy and quick to use. Let us know if you want to have additional categories.

Tuesday, October 5, 2010

October release of RCSB PDB website

The latest release of the RCSB PDB website features a number of exiting new features some of which I will present in more detail during follow-up blog postings.

Above a screenshot of the new Category Browser for the Molecule of the Months.

Here a list of all new features:

Molecule of the Month Improvements
PDBMobile for the iPhone
Query Result Browser Improvements
Chemical Components
Tabular Report Improvements
Comparison Tool Improvements
General Site Improvements