Protein Sequence and Structure Bioinformatics: August 2009

Tuesday, August 25, 2009

Hydrogen bonds in proteins

Hydrogen bonds play an important part in determining the 3D structure of proteins. The formation of H-bonds contributes to the stability of a protein. They provide a characteristic pattern in secondary structure elements that can be used for secondary structure assignment.

A hydrogen bond is formed when two electonegative atoms compete for the same hydrogen atom. While the hydrogen is covalently bound to one of the atoms, the donor, it interacts with the other atom, the acceptor. The possibility to interact with one atom, while covalently bound to another, is special for hydrogens. The electrostatic and covalent aspects of the bond cause the three atoms to stay mostly linear.

H-bonds in proteins most frequently involve the C=O and N-H groups of amino acids in the polypeptide chain. A single oxygen can simultaneously participate in two bonds.

Here again some example code that can calculate the energy of an H-bond between the N-H and C=O atoms of two amino acids using BioJava. It assumes that the coordinates for H-bonds were calculated as described in one of my earlier postings.


/** calculate HBond energy of two groups in cal/mol ...
    * see Creighton page 147 f
    *
    * Jeffrey, George A., An introduction to hydrogen bonding, Oxford University Press, 1997.
    * categorizes hbonds with donor-acceptor distances of
    * 2.2-2.5 Å as "strong, mostly covalent",
    * 2.5-3.2 Å as "moderate, mostly electrostatic",
    * 3.2-4.0 Å as "weak, electrostatic".
    *  Energies are given as 40-14, 15-4, and <4 kcal/mol respectively.
    *
    */


   public  double calculateHBondEnergy(AminoAcid one, AminoAcid two )
   throws StructureException{


      //System.out.println("calcHBondEnergy" + one + "|" + two);

      Atom N = one.getN();
      Atom H = one.getH();

      Atom O = two.getO();
      Atom C = two.getC();

      double dno = Calc.getDistance(O,N);
      double dhc = Calc.getDistance(C,H);
      double dho = Calc.getDistance(O,H);
      double dnc = Calc.getDistance(C,N);

      

      double contact = MINDIST ;

      //   there seems to be a contact!
      if ( (dno < contact) || (dhc < contact) || (dnc < contact) || (dno < contact)) {
         //System.out.println("!!! contact " + one + " " + two);
         return HBONDLOWENERGY ;
      }

      double e1 = Q / dho  - Q / dhc ;
      double e2 = Q / dnc  - Q / dno ;

      double energy = e1 + e2;

    

      // bond too weak
      if ( energy > HBONDHIGHENERGY)
         return 0;

      // test to avoid bond too strong
      if ( energy > HBONDLOWENERGY)
         return energy;

      return HBONDLOWENERGY ;

   }

Tuesday, August 18, 2009

RCSB PDB releases new version of web site.

The RCSB has released a new major version of their website.

The main new features are:

For a detailed summary of the new features go to the What's new page.

One of the new Web Widgets is used in the right hand menu of this blog...

Monday, August 17, 2009

BioJava plans for the next months

Posted this to the BioJava mailing lists:

Here a quick summary of what I propose to be our action plan for the next months for BioJava:

* I would like to call for a code-freeze in 2 weeks (or so) in order to finalize the new modularized and mavenized version of biojava for the developers. The current developmental trunk will remain permanently frozen and all future work should continue at a new location in SVN. As such it will be important that all developers commit any changes they are working on before that.

* We will update the documentation for how to obtain a new mavenized checkout on the wiki.

* After the change the new modules need to be tested and if no major problems are found, the ok will be given to continue working on the new modules (at the new location)

* All developers should obtain a new checkout.

* We need to identify sub-module leaders who will take over leadership of the sub-modules.

In order to come up with a new release of biojava we should continue development on the new modules for a few months. Talking off list with Richard Holland it looks like we will have a hackaton in January in Cambridge, U.K. (details to be finalized and announced). I suggest that we use that opportunity to focus on further developing the modules and make a new public BioJava release shortly after that.

At the present I see the following topics that would be great to work on until and during the hackaton in order to prepare a shiny new version of BioJava for public release:

+ Work on standardizing the organization of the modules (tests, examples, source, docu etc.)
+ Add new modules
+ Improve existing modules
+ Anything the module leaders deem necessary for their modules.
+ Use OSGI for visualization related modules

I can post a more detailed and specific list of things to work on if people are interested.

Sunday, August 16, 2009

How to calculate H atoms for Nitrogens in protein structures

Only few protein structures contain coordinates for hydrogen (H) atoms. This is because X-ray crystallography usually does not resolve the positions of H atoms. They have only one electron and they barely scatter X-rays. Only a few structures have been determined with a high enough resolution to place the hydrogen atoms (< 1.2 Å). Some PDB files contain H atoms that have been modeled into the protein structure. Neutron Diffraction is a method that does allow to observe H atoms. An protein that has been determined with a combination of X-ray and Neutron Diffraction and that contains hydrogen atoms is PDB ID:3HGN .

If hydrogen atoms are required for calculations (e.g. to calculate hydrogen bond energies in protein structures) they need to be inferred from the available information. Here two possible ways to approximate positions for H-atoms that are bound to the Nitrogen (N) atom of an amino acid.

Let's start with having a look at the peptide plane. We can use vectors from the plane in order to calculate H atoms.

A simple way to infer the coordinates is by putting the H on the opposite side of the O atom that is bound to C. C coordinates are from amino acid i. N, Cα atoms from amino acid i+1, for which the H atom is being calculated. For the examples below we are using BioJava. It treats every atom as a vector. This means, if the code fragments below talk about atoms it is meant as a synonym for a vector in a 3D space.


Group a  = groups[i];
Group b  = groups[i+1];

// atom will be for group b.
Atom H = calcSimpleH(a.getC(),b.getN(),b.getCA());

/** Calculates the H atom for group B (i+1) */
private static Atom calcSimpleH(Atom c,Atom o, Atom n) throws StructureException{

Atom h = Calc.substract(c,o);
     double dist = Calc.getDistance(c,o);
     //System.out.println(dist);
     double x = n.getX() + h.getX() / dist;
     double y = n.getY() + h.getY() / dist;
     double z = n.getZ() + h.getZ() / dist;

     h.setX(x);
     h.setY(y);
     h.setZ(z);

     return h;
}

One issue with this first approach is that H is not necessarily directly opposite of the C-O.

An alternative method is shown below:

Use the unit vectors NC and NCα to get the direction of the vector and substract it from N.



Group a  = groups[i];
Group b  = groups[i+1];
Atom H = calc_H(a.getC(),b.getN(),b.getCA());


/**
* Calculates the H atom for group B (i+1)
*/
private static Atom calc_H(Atom C, Atom N, Atom CA) throws StructureException
{

     Atom nc  = Calc.substract(N,C);
     Atom nca = Calc.substract(N,CA);

     Atom u_nc  = Calc.unitVector(nc)   ;
     Atom u_nca = Calc.unitVector(nca);

     Atom added = Calc.substract(u_nc,u_nca);

     //(According to Creighton the distance N-H is 1.03 +/- 0.02 Å.)
     Atom U = Calc.unitVector(added);

     Atom H = Calc.add(N,U);

     return H;
}

If you are doing your own implementation of this I recommend to add a check that the distance N-H is really 1 Angstrom. Additionally write out a modified PDB file and use a 3D viewer to verify that the atom is placed on the correct side of the N.

Protein Sequence and Structure Bioinformatics