Friday, May 4, 2012

Systematic domain based structure alignments at the RCSB PDB

At the RCSB PDB web site we habe been providing pre-calculated and systematic protein structure alignments already for about two years. Every week we run systematic structure alignments for newly released proteins across a representative subset of the database and try to identify related proteins based on their 3D shape.

This week we have released a major upgrade to those efforts. Our pre-computed alignments are now using domain information to split protein chains into smaller subunits. We introduced this change because many proteins are built of more than just one domain. In that case our previous results were a bit unclear in the sense that results for any of the domains were displayed together and that made the data more difficult to compare and interpret.

The new domain based procedure is using the SCOP domain assignments where available to define how to break up protein chains. If the structures are too new to be annotated by SCOP (like all newly released proteins), then we use a software called ProteinDomainParser to define domains based on geometric criteria. Even if the algorithm sometimes defines a break point that might not be the same what SCOP would define, it is still interesting if you find structural neighbors with such fragments of proteins.



In addition, this release of the RCSB PDB web site also provides a new display of protein chains and how different sources annotate protein domains (see the image above). This domain summary shows SCOP domains, ProteinDomainParser domains and Pfam domains. Here an example for  a Cyclodextrin glycosyl transferase (3BMV) from Thermoanerobacterium thermosulfurigenes. It is composed of four domains, which are identified by all of the three data sources.