Bart Hazes wrote > > There are many cases where people use a structure refined at high > resolution as a starting molecular replacement structure for a closely > related/same protein with a lower resolution data set and get substantially > better R statistics than you would expect for that resolution. So one factor > in the "R factor gap" is many small errors that are introduced during model > building and not recognized and fixed later due to limited resolution. In a > perfect world, refinement would find the global minimum but in practice all > these little errors get stuck in local minima with distortions in neighboring > atoms compensating for the initial error and thereby hiding their existence.
Excellent point. On Thursday, October 28, 2010 02:49:11 pm Jacob Keller wrote: > So let's say I take a 0.6 Ang structure, artificially introduce noise into > corresponding Fobs to make the resolution go down to 2 Ang, and refine using > the 0.6 Ang model--do I actually get R's better than the > artificially-inflated sigmas? > Or let's say I experimentally decrease I/sigma by attenuating the beam and > collect another data set--same situation? This I can answer based on experience. One can take the coordinates from a structure refined at near atomic resolution (~1.0A), including multiple conformations, partial occupancy waters, etc, and use it to calculate R factors against a lower resolution (say 2.5A) data set collected from an isomorphous crystal. The R factors from this total-rigid-body replacement will be better than anything you could get from refinement against the lower resolution data. In fact, refinement from this starting point will just make the R factors worse. What this tells us is that the crystallographic residuals can recognize a better model when they see one. But our refinement programs are not good enough to produce such a better model in the first place. Worsr, they are not even good enough to avoid degrading the model. That's essentially the same thing Bart said, perhaps a little more pessimistic :-) cheers, Ethan > > JPK > > ----- Original Message ----- > From: Bart Hazes > To: CCP4BB@JISCMAIL.AC.UK > Sent: Thursday, October 28, 2010 4:13 PM > Subject: Re: [ccp4bb] Against Method (R) > > > There are many cases where people use a structure refined at high > resolution as a starting molecular replacement structure for a closely > related/same protein with a lower resolution data set and get substantially > better R statistics than you would expect for that resolution. So one factor > in the "R factor gap" is many small errors that are introduced during model > building and not recognized and fixed later due to limited resolution. In a > perfect world, refinement would find the global minimum but in practice all > these little errors get stuck in local minima with distortions in neighboring > atoms compensating for the initial error and thereby hiding their existence. > > Bart > > On 10-10-28 11:33 AM, James Holton wrote: > It is important to remember that if you have Gaussian-distributed errors > and you plot error bars between +1 sigma and -1 sigma (where "sigma" is the > rms error), then you expect the "right" curve to miss the error bars about > 30% of the time. This is just a property of the Gaussian distribution: you > expect a certain small number of the errors to be large. If the curve passes > within the bounds of every single one of your error bars, then your error > estimates are either too big, or the errors have a non-Gaussian distribution. > > > For example, if the noise in the data somehow had a uniform distribution > (always between +1 and -1), then no data point will ever be "kicked" further > than "1" away from the "right" curve. In this case, a data point more than > "1" away from the curve is evidence that you either have the wrong model > (curve), or there is some other kind of noise around (wrong "error model"). > > As someone who has spent a lot of time looking into how we measure > intensities, I think I can say with some considerable amount of confidence > that we are doing a pretty good job of estimating the errors. At least, they > are certainly not off by an average of 40% (20% in F). You could do better > than that estimating the intensities by eye! > > Everybody seems to have their own favorite explanation for what I call > the "R factor gap": solvent, multi-confomer structures, absorption effects, > etc. However, if you go through the literature (old and new) you will find > countless attempts to include more sophisticated versions of each of these > hypothetically "important" systematic errors, and in none of these cases has > anyone ever presented a physically reasonable model that explained the > observed spot intensities from a protein crystal to within experimental > error. Or at least, if there is such a paper, I haven't seen it. > > Since there are so many possible things to "correct", what I would like > to find is a structure that represents the transition between the "small > molecule" and the "macromolecule" world. Lysozyme does not qualify! Even > the famous 0.6 A structure of lysozyme (2vb1) still has a "mean absolute > chi": <|Iobs-Icalc|/sig(I)> = 4.5. Also, the 1.4 A structure of the > tetrapeptide QQNN (2olx) is only a little better at <|chi|> = 3.5. I realize > that the "chi" I describe here is not a "standard" crystallographic > statistic, and perhaps I need a statistics lesson, but it seems to me there > ought to be a case where it is close to 1. > > -James Holton > MAD Scientist > > > On Thu, Oct 28, 2010 at 9:04 AM, Jacob Keller > <j-kell...@fsm.northwestern.edu> wrote: > > So I guess there is never a case in crystallography in which our > models predict the data to within the errors of data collection? I > guess the situation might be similar to fitting a Michaelis-Menten > curve, in which the fitted line often misses the error bars of the > individual points, but gets the overall pattern right. In that case, > though, I don't think we say that we are inadequately modelling the > data. I guess there the error bars are actually too small (are > underestimated.) Maybe our intensity errors are also underestimated? > > JPK > > > On Thu, Oct 28, 2010 at 9:50 AM, George M. Sheldrick > <gshe...@shelx.uni-ac.gwdg.de> wrote: > > > > Not quite. I was trying to say that for good small molecule data, R1 > is > > usally significantly less than Rmerge, but never less than the > precision > > of the experimental data measured by 0.5*<sigmaI>/<I> = 0.5*Rsigma > > (or the very similar 0.5*Rpim). > > > > George > > > > Prof. George M. Sheldrick FRS > > Dept. Structural Chemistry, > > University of Goettingen, > > Tammannstr. 4, > > D37077 Goettingen, Germany > > Tel. +49-551-39-3021 or -3068 > > Fax. +49-551-39-22582 > > > > > > On Thu, 28 Oct 2010, Jacob Keller wrote: > > > >> So I guess a consequence of what you say is that since in cases > where there is > >> no solvent the R values are often better than the precision of the > actual > >> measurements (never true with macromolecular crystals involving > solvent), > >> perhaps our real problem might be modelling solvent? > >> Alternatively/additionally, I wonder whether there also might be more > >> variability molecule-to-molecule in proteins, which we may not model > well > >> either. > >> > >> JPK > >> > >> ----- Original Message ----- From: "George M. Sheldrick" > >> <gshe...@shelx.uni-ac.gwdg.de> > >> To: <CCP4BB@JISCMAIL.AC.UK> > >> Sent: Thursday, October 28, 2010 4:05 AM > >> Subject: Re: [ccp4bb] Against Method (R) > >> > >> > >> > It is instructive to look at what happens for small molecules where > >> > there is often no solvent to worry about. They are often refined > >> > using SHELXL, which does indeed print out the weighted R-value > based > >> > on intensities (wR2), the conventional unweighted R-value R1 (based > >> > on F) and <sigmaI>/<I>, which it calls R(sigma). For well-behaved > >> > crystals R1 is in the range 1-5% and R(merge) (based on > intensities) > >> > is in the range 3-9%. As you suggest, 0.5*R(sigma) could be > regarded > >> > as the lower attainable limit for R1 and this is indeed the case in > >> > practice (the factor 0.5 approximately converts from I to F). Rpim > >> > gives similar results to R(sigma), both attempt to measure the > >> > precision of the MERGED data, which are what one is refining > against. > >> > > >> > George > >> > > >> > Prof. George M. Sheldrick FRS > >> > Dept. Structural Chemistry, > >> > University of Goettingen, > >> > Tammannstr. 4, > >> > D37077 Goettingen, Germany > >> > Tel. +49-551-39-3021 or -3068 > >> > Fax. +49-551-39-22582 > >> > > >> > > >> > On Wed, 27 Oct 2010, Ed Pozharski wrote: > >> > > >> > > On Tue, 2010-10-26 at 21:16 +0100, Frank von Delft wrote: > >> > > > the errors in our measurements apparently have no > >> > > > bearing whatsoever on the errors in our models > >> > > > >> > > This would mean there is no point trying to get better crystals, > right? > >> > > Or am I also wrong to assume that the dataset with higher > I/sigma in the > >> > > highest resolution shell will give me a better model? > >> > > > >> > > On a related point - why is Rmerge considered to be the limiting > value > >> > > for the R? Isn't Rmerge a poorly defined measure itself that > >> > > deteriorates at least in some circumstances (e.g. increased > redundancy)? > >> > > Specifically, shouldn't "ideal" R approximate 0.5*<sigmaI>/<I>? > >> > > > >> > > Cheers, > >> > > > >> > > Ed. > >> > > > >> > > > >> > > > >> > > -- > >> > > "I'd jump in myself, if I weren't so good at whistling." > >> > > Julian, King of Lemurs > >> > > > >> > > > >> > >> > >> ******************************************* > >> Jacob Pearson Keller > >> Northwestern University > >> Medical Scientist Training Program > >> Dallos Laboratory > >> F. Searle 1-240 > >> 2240 Campus Drive > >> Evanston IL 60208 > >> lab: 847.491.2438 > >> cel: 773.608.9185 > >> email: j-kell...@northwestern.edu > >> ******************************************* > >> > >> > > > > > > > > -- Ethan A Merritt Biomolecular Structure Center, K-428 Health Sciences Bldg University of Washington, Seattle 98195-7742