Re: [ccp4bb] Against Method (R)

Ethan Merritt Thu, 28 Oct 2010 15:12:41 -0700

Bart Hazes wrote > 
>   There are many cases where people use a structure refined at high 
> resolution as a starting molecular replacement structure for a closely 
> related/same protein with a lower resolution data set and get substantially 
> better R statistics than you would expect for that resolution. So one factor 
> in the "R factor gap" is many small errors that are introduced during model 
> building and not recognized and fixed later due to limited resolution. In a 
> perfect world, refinement would find the global minimum but in practice all 
> these little errors get stuck in local minima with distortions in neighboring 
> atoms compensating for the initial error and thereby hiding their existence.


Excellent point.

On Thursday, October 28, 2010 02:49:11 pm Jacob Keller wrote:
> So let's say I take a 0.6 Ang structure, artificially introduce noise into 
> corresponding Fobs to make the resolution go down to 2 Ang, and refine using 
> the 0.6 Ang model--do I actually get R's better than the 
> artificially-inflated sigmas?
> Or let's say I experimentally decrease I/sigma by attenuating the beam and 
> collect another data set--same situation?

This I can answer based on experience.  One can take the coordinates from a 
structure
refined at near atomic resolution (~1.0A), including multiple conformations,
partial occupancy waters, etc, and use it to calculate R factors against a lower
resolution (say 2.5A) data set collected from an isomorphous crystal.  The
R factors from this total-rigid-body replacement will be better than anything 
you
could get from refinement against the lower resolution data.  In fact, 
refinement
from this starting point will just make the R factors worse.

What this tells us is that the crystallographic residuals can recognize a
better model when they see one. But our refinement programs are not good 
enough to produce such a better model in the first place. Worsr, they are not
even good enough to avoid degrading the model.

That's essentially the same thing Bart said, perhaps a little more pessimistic 
:-)

        cheers,

                Ethan



> 
> JPK
> 
>   ----- Original Message ----- 
>   From: Bart Hazes 
>   To: CCP4BB@JISCMAIL.AC.UK 
>   Sent: Thursday, October 28, 2010 4:13 PM
>   Subject: Re: [ccp4bb] Against Method (R)
> 
> 
>   There are many cases where people use a structure refined at high 
> resolution as a starting molecular replacement structure for a closely 
> related/same protein with a lower resolution data set and get substantially 
> better R statistics than you would expect for that resolution. So one factor 
> in the "R factor gap" is many small errors that are introduced during model 
> building and not recognized and fixed later due to limited resolution. In a 
> perfect world, refinement would find the global minimum but in practice all 
> these little errors get stuck in local minima with distortions in neighboring 
> atoms compensating for the initial error and thereby hiding their existence.
> 
>   Bart
> 
>   On 10-10-28 11:33 AM, James Holton wrote: 
>     It is important to remember that if you have Gaussian-distributed errors 
> and you plot error bars between +1 sigma and -1 sigma (where "sigma" is the 
> rms error), then you expect the "right" curve to miss the error bars about 
> 30% of the time.  This is just a property of the Gaussian distribution: you 
> expect a certain small number of the errors to be large.  If the curve passes 
> within the bounds of every single one of your error bars, then your error 
> estimates are either too big, or the errors have a non-Gaussian distribution. 
>  
> 
>     For example, if the noise in the data somehow had a uniform distribution 
> (always between +1 and -1), then no data point will ever be "kicked" further 
> than "1" away from the "right" curve.  In this case, a data point more than 
> "1" away from the curve is evidence that you either have the wrong model 
> (curve), or there is some other kind of noise around (wrong "error model").
> 
>     As someone who has spent a lot of time looking into how we measure 
> intensities, I think I can say with some considerable amount of confidence 
> that we are doing a pretty good job of estimating the errors.  At least, they 
> are certainly not off by an average of 40% (20% in F).  You could do better 
> than that estimating the intensities by eye!
> 
>     Everybody seems to have their own favorite explanation for what I call 
> the "R factor gap": solvent, multi-confomer structures, absorption effects, 
> etc.  However, if you go through the literature (old and new) you will find 
> countless attempts to include more sophisticated versions of each of these 
> hypothetically "important" systematic errors, and in none of these cases has 
> anyone ever presented a physically reasonable model that explained the 
> observed spot intensities from a protein crystal to within experimental 
> error.  Or at least, if there is such a paper, I haven't seen it.
> 
>     Since there are so many possible things to "correct", what I would like 
> to find is a structure that represents the transition between the "small 
> molecule" and the "macromolecule" world.  Lysozyme does not qualify!  Even 
> the famous 0.6 A structure of lysozyme (2vb1) still has a "mean absolute 
> chi": <|Iobs-Icalc|/sig(I)> = 4.5.  Also, the 1.4 A structure of the 
> tetrapeptide QQNN (2olx) is only a little better at <|chi|> = 3.5.  I realize 
> that the "chi" I describe here is not a "standard" crystallographic 
> statistic, and perhaps I need a statistics lesson, but it seems to me there 
> ought to be a case where it is close to 1.
> 
>     -James Holton
>     MAD Scientist
> 
> 
>     On Thu, Oct 28, 2010 at 9:04 AM, Jacob Keller 
> <j-kell...@fsm.northwestern.edu> wrote:
> 
>       So I guess there is never a case in crystallography in which our
>       models predict the data to within the errors of data collection? I
>       guess the situation might be similar to fitting a Michaelis-Menten
>       curve, in which the fitted line often misses the error bars of the
>       individual points, but gets the overall pattern right. In that case,
>       though, I don't think we say that we are inadequately modelling the
>       data. I guess there the error bars are actually too small (are
>       underestimated.) Maybe our intensity errors are also underestimated?
> 
>       JPK
> 
> 
>       On Thu, Oct 28, 2010 at 9:50 AM, George M. Sheldrick
>       <gshe...@shelx.uni-ac.gwdg.de> wrote:
>       >
>       > Not quite. I was trying to say that for good small molecule data, R1 
> is
>       > usally significantly less than Rmerge, but never less than the 
> precision
>       > of the experimental data measured by 0.5*<sigmaI>/<I> = 0.5*Rsigma
>       > (or the very similar 0.5*Rpim).
>       >
>       > George
>       >
>       > Prof. George M. Sheldrick FRS
>       > Dept. Structural Chemistry,
>       > University of Goettingen,
>       > Tammannstr. 4,
>       > D37077 Goettingen, Germany
>       > Tel. +49-551-39-3021 or -3068
>       > Fax. +49-551-39-22582
>       >
>       >
>       > On Thu, 28 Oct 2010, Jacob Keller wrote:
>       >
>       >> So I guess a consequence of what you say is that since in cases 
> where there is
>       >> no solvent the R values are often better than the precision of the 
> actual
>       >> measurements (never true with macromolecular crystals involving 
> solvent),
>       >> perhaps our real problem might be modelling solvent?
>       >> Alternatively/additionally, I wonder whether there also might be more
>       >> variability molecule-to-molecule in proteins, which we may not model 
> well
>       >> either.
>       >>
>       >> JPK
>       >>
>       >> ----- Original Message ----- From: "George M. Sheldrick"
>       >> <gshe...@shelx.uni-ac.gwdg.de>
>       >> To: <CCP4BB@JISCMAIL.AC.UK>
>       >> Sent: Thursday, October 28, 2010 4:05 AM
>       >> Subject: Re: [ccp4bb] Against Method (R)
>       >>
>       >>
>       >> > It is instructive to look at what happens for small molecules where
>       >> > there is often no solvent to worry about. They are often refined
>       >> > using SHELXL, which does indeed print out the weighted R-value 
> based
>       >> > on intensities (wR2), the conventional unweighted R-value R1 (based
>       >> > on F) and <sigmaI>/<I>, which it calls R(sigma). For well-behaved
>       >> > crystals R1 is in the range 1-5% and R(merge) (based on 
> intensities)
>       >> > is in the range 3-9%. As you suggest, 0.5*R(sigma) could be 
> regarded
>       >> > as the lower attainable limit for R1 and this is indeed the case in
>       >> > practice (the factor 0.5 approximately converts from I to F). Rpim
>       >> > gives similar results to R(sigma), both attempt to measure the
>       >> > precision of the MERGED data, which are what one is refining 
> against.
>       >> >
>       >> > George
>       >> >
>       >> > Prof. George M. Sheldrick FRS
>       >> > Dept. Structural Chemistry,
>       >> > University of Goettingen,
>       >> > Tammannstr. 4,
>       >> > D37077 Goettingen, Germany
>       >> > Tel. +49-551-39-3021 or -3068
>       >> > Fax. +49-551-39-22582
>       >> >
>       >> >
>       >> > On Wed, 27 Oct 2010, Ed Pozharski wrote:
>       >> >
>       >> > > On Tue, 2010-10-26 at 21:16 +0100, Frank von Delft wrote:
>       >> > > > the errors in our measurements apparently have no
>       >> > > > bearing whatsoever on the errors in our models
>       >> > >
>       >> > > This would mean there is no point trying to get better crystals, 
> right?
>       >> > > Or am I also wrong to assume that the dataset with higher 
> I/sigma in the
>       >> > > highest resolution shell will give me a better model?
>       >> > >
>       >> > > On a related point - why is Rmerge considered to be the limiting 
> value
>       >> > > for the R?  Isn't Rmerge a poorly defined measure itself that
>       >> > > deteriorates at least in some circumstances (e.g. increased 
> redundancy)?
>       >> > > Specifically, shouldn't "ideal" R approximate 0.5*<sigmaI>/<I>?
>       >> > >
>       >> > > Cheers,
>       >> > >
>       >> > > Ed.
>       >> > >
>       >> > >
>       >> > >
>       >> > > --
>       >> > > "I'd jump in myself, if I weren't so good at whistling."
>       >> > >                                Julian, King of Lemurs
>       >> > >
>       >> > >
>       >>
>       >>
>       >> *******************************************
>       >> Jacob Pearson Keller
>       >> Northwestern University
>       >> Medical Scientist Training Program
>       >> Dallos Laboratory
>       >> F. Searle 1-240
>       >> 2240 Campus Drive
>       >> Evanston IL 60208
>       >> lab: 847.491.2438
>       >> cel: 773.608.9185
>       >> email: j-kell...@northwestern.edu
>       >> *******************************************
>       >>
>       >>
>       >
> 
> 
> 
> 
> 
> 

-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742

Re: [ccp4bb] Against Method (R)

Reply via email to