Validation of protein structures and models
Although we were mainly interested in the validation of model built by
homology, we decided to join in a European project to validate
experimental structures. In retrospect, this was a very good thing for
our modelling project, because there were so many 'inconveniences' in
the PDB files that a thorough evaluation of the template structures
used to base models on is now a serious step in the whole modelling
process. You can see a report for every
PDB file. These reports are made with
WHAT_CHECK, the free
subset of WHAT IF.
We also made a server that combines
the results of the three major coordinate based structure validation programs
(SURVOL, PROCHECK and WHAT_CHECK). These servers
are are nowadays maintained by the MSD unit
at the EBI.
Errors in protein structures.
R.W.W. Hooft, G. Vriend, C. Sander, E.E. Abola,
Nature (1996) 381, 272-272.
This short note in NATURE describes how more than 1,000,000 problems were
detected in the PDB using the
WHAT_CHECK program. This article was submitted with the title 1000000 outliers
in protein structures but NATURE changed the title without asking us.
We called all these problems deliberately outliers
because in most cases we can only be 90%, 99%, 99.9%, etc., sure that
a detected problem is really an error. A small but significant fraction of the
problems could represent actual features of the protein structure. One should keep in mind that a values that deviate
three sigma from the mean should show up in about 1 per 1000 of all cases.
Chapter 23 of the
WHAT IF writeup tells you which checks can be performed.
If you use our PDB structure quality
the WHAT_CHECK program, please refer to this
Quality control of protein models: Directional atomic contact analysis.
G. Vriend, C. Sander.
J.Appl.Cryst. (1993) 26, 47-60.
This article describes the packing quality control module of
WHAT IF. This method is also called DACA
for Directional Atomic Contact Analysis. The idea is that the distribution
of atom types is determined around amino fragments. We assume that the
average distribution observed in the PDB
is representative for what can happen in nature. Unlike many other
methods, we do not sperically average the distribution around the fragment,
but we keep the x,y,z directionality of the contacts. The quality of any
structure is now easily determined by a convolution of the average distributions
and the observed contacts in the protein to be checked.
Reconstruction of symmetry related molecules from protein data bank (PDB) files.
R. Hooft, C. Sander, G. Vriend,
J. Appl. Cryst. (1994) 27, 1006-1009.
One of the more important administrative parameters that should be correct
is the symmetry information. This article describes a couple dozen problems
that we observed while trying to correctly extract symmetry information
from PDB files. In many cases easy
solutions could be programmed, but if your program does not have this
code incorporated, you are in deep trouble because the symmetry information
is only given correctly in 85% of all PDB
files. All possible correction methods are always automatically applied
when WHAT IF reads a
Positioning hydrogen atoms by optimizing hydrogen-bond networks in protein structures.
R.W.W.Hooft, C.Sander, G.Vriend,
PROTEINS (1996) 26, 363-376.
This article describes the HB2 options in WHAT IF. In contrast to most other hydrogen
bond calculation programs, this method tries to optimize the whole hydrogen bonding
network of the full protein, rather than just looking around for the nearest suitable
partner. Additionally, the method looks if flipping the sidechains of Asn, Gln and
His residues might improve the hydrogen bonding network. The idea behind this is that
in most Xray projects the difference between C, N and O atoms cannot been seen. If
in such cases the local H-bond network is too complicated for the human brain to solve,
the chance that the sidechain is put into the density 'the wrong way around' is
considerable.All PDB files were looked at with this technique and the outcome
is available on the WWW.
A scanned version of this article is available.
Verification of protein structures: Side-chain planarity.
R.W.W. Hooft, C.Sander and G.Vriend,
J. Appl. Cryst. (1996) 29, 714--716.
This paper describes the verification of side-chain planarity by
WHAT IF and WHAT CHECK.
It also describes the construction of a representative list of
PDB files as used in the WHAT IF database.
Objectively judging the quality of a protein structure from a Ramachandran plot.
R.W.W. Hooft, C.Sander and G.Vriend,
CABIOS (1997), 13, 425-430.
This paper describes a statistical means of calculating an objective score from
a Ramachandran plot, taking the differences between residue types into account.
The scanned article.
Some WHAT_CHECK checks explained (1).
PDB Newsletter. (1998) April volume.
In this short note we explain the cell dimension check option in
WHAT_CHECK. The full text is available.
- Who checks the checkers? Four validation tools applied to eight atomic resolution structures.
K.Wilson, C.Sander, R.W.W.Hooft, G.Vriend, et al.
J.Mol.Biol.. (1998) 276,417-436.
Validating protein structures (and models) is one of our hobbies. This article (which has 20
authors, the whole validation consortium is on it...) describes the results of validating
some structures solved at atomic resolution (around 1.0 A) that thus were supposed to be
guaranteed correct. This study revealed some problems in the ultra highly refined structures
and some problems in the refinement programs. The general conclusion should be that the
validation programs are actually working very well, and that remarks by some crystallographers that
"these validation programs give very many false positives" really are not supported by the
results of this study.