Somebody sent this to me after a previous post a while back--a sort of case-study:
Wang, J. (2010). Inclusion of weak high-resolution X-ray data for improvement of a group II intron structure. Acta crystallographica Section D, Biological crystallography 66, 988-1000. JPK On Mon, Jan 30, 2012 at 4:03 AM, Frank von Delft <frank.vonde...@sgc.ox.ac.uk> wrote: > Hi Randy - thank you for a very interesting reminder to old literature. > > I'm intrigued: how come this apparently excellent idea has not become > standard best practice in the 14 years since it was published? > > phx > > > > On 30/01/2012 09:40, Randy Read wrote: > > Hi, > > Here are a couple of links on the idea of judging resolution by a type of > cross-validation with data not used in refinement: > > Ling et al, 1998: http://pubs.acs.org/doi/full/10.1021/bi971806n > Brunger et al, > 2008: http://journals.iucr.org/d/issues/2009/02/00/ba5131/index.html > (cites earlier relevant papers from Brunger's group) > > Best wishes, > > Randy Read > > On 30 Jan 2012, at 07:09, arka chakraborty wrote: > > Hi all, > > In the context of the above going discussion can anybody post links for a > few relevant articles? > > Thanks in advance, > > ARKO > > On Mon, Jan 30, 2012 at 3:05 AM, Randy Read <rj...@cam.ac.uk> wrote: >> >> Just one thing to add to that very detailed response from Ian. >> >> We've tended to use a slightly different approach to determining a >> sensible resolution cutoff, where we judge whether there's useful >> information in the highest resolution data by whether it agrees with >> calculated structure factors computed from a model that hasn't been refined >> against those data. We first did this with the complex of the Shiga-like >> toxin B-subunit pentamer with the Gb3 trisaccharide (Ling et al, 1998). >> From memory, the point where the average I/sig(I) drops below 2 was around >> 3.3A. However, we had a good molecular replacement model to solve this >> structure and, after just carrying out rigid-body refinement, we computed a >> SigmaA plot using data to the edge of the detector (somewhere around 2.7A, >> again from memory). The SigmaA plot dropped off smoothly to 2.8A >> resolution, with values well above zero (indicating significantly better >> than random agreement), then dropped suddenly. So we chose 2.8A as the >> cutoff. Because there were four pentamers in the asymmetric unit, we could >> then use 20-fold NCS averaging, which gave a fantastic map. In this case, >> the averaging certainly helped to pull out something very useful from a very >> weak signal, because the maps weren't nearly as clear at lower resolution. >> >> Since then, a number of other people have applied similar tests. Notably, >> Axel Brunger has done some careful analysis to show that it can indeed be >> useful to take data beyond the conventional limits. >> >> When you don't have a great MR model, you can do something similar by >> limiting the resolution for the initial refinement and rebuilding, then >> assessing whether there's useful information at higher resolution by using >> the improved model (which hasn't seen the higher resolution data) to compute >> Fcalcs. By the way, it's not necessary to use a SigmaA plot -- the >> correlation between Fo and Fc probably works just as well. Note that, when >> the model has been refined against the lower resolution data, you'll expect >> a drop in correlation at the resolution cutoff you used for refinement, >> unless you only use the cross-validation data for the resolution range used >> in refinement. >> >> ----- >> Randy J. Read >> Department of Haematology, University of Cambridge >> Cambridge Institute for Medical Research Tel: +44 1223 336500 >> Wellcome Trust/MRC Building Fax: +44 1223 336827 >> Hills Road >> E-mail: rj...@cam.ac.uk >> Cambridge CB2 0XY, U.K. >> www-structmed.cimr.cam.ac.uk >> >> On 29 Jan 2012, at 17:25, Ian Tickle wrote: >> >> > Jacob, here's my (personal) take on this: >> > >> > The data quality metrics that everyone uses clearly fall into 2 >> > classes: 'consistency' metrics, i.e. Rmerge/meas/pim and CC(1/2) which >> > measure how well redundant observations agree, and signal/noise ratio >> > metrics, i.e. mean(I/sigma) and completeness, which relate to the >> > information content of the data. >> > >> > IMO the basic problem with all the consistency metrics is that they >> > are not measuring the quantity that is relevant to refinement and >> > electron density maps, namely the information content of the data, at >> > least not in a direct and meaningful way. This is because there are 2 >> > contributors to any consistency metric: the systematic errors (e.g. >> > differences in illuminated volume and absorption) and the random >> > errors (from counting statistics, detector noise etc.). If the data >> > are collected with sufficient redundancy the systematic errors should >> > hopefully largely cancel, and therefore only the random errors will >> > determine the information content. Therefore the systematic error >> > component of the consistency measure (which I suspect is the biggest >> > component, at least for the strong reflections) is not relevant to >> > measuring the information content. If the consistency measure only >> > took into account the random error component (which it can't), then it >> > would be essentially be a measure of information content, if only >> > indirectly (but then why not simply use a direct measure such as the >> > signal/noise ratio?). >> > >> > There are clearly at least 2 distinct problems with Rmerge, first it's >> > including systematic errors in its measure of consistency, second it's >> > not invariant with respect to the redundancy (and third it's useless >> > as a statistic anyway because you can't do any significance tests on >> > it!). The redundancy problem is fixed to some extent with Rpim etc, >> > but that still leaves the other problems. It's not clear to me that >> > CC(1/2) is any better in this respect, since (as far as I understand >> > how it's implemented), one cannot be sure that the systematic errors >> > will cancel for each half-dataset Imean, so it's still likely to >> > contain a large contribution from the irrelevant systematic error >> > component and so mislead in respect of the real data quality exactly >> > in the same way that Rmerge/meas/pim do. One may as well use the >> > Rmerge between the half dataset Imeans, since there would be no >> > redundancy effect (i.e. the redundancy would be 2 for all included >> > reflections). >> > >> > I did some significance tests on CC(1/2) and I got silly results, for >> > example it says that the significance level for the CC is ~ 0.1, but >> > this corresponded to a huge Rmerge (200%) and a tiny mean(I/sigma) >> > (0.4). It seems that (without any basis in statistics whatsoever) the >> > rule-of-thumb CC > 0.5 is what is generally used, but I would be >> > worried that the statistics are so far divorced from the reality - it >> > suggests that something is seriously wrong with the assumptions! >> > >> > Having said all that, the mean(I/sigma) metric, which on the face of >> > it is much more closely related to the information content and >> > therefore should be a more relevant metric than Rmerge/meas/pim & >> > CC(1/2), is not without its own problems (which probably explains the >> > continuing popularity of the other metrics!). First and most obvious, >> > it's a hostage to the estimate of sigma(I) used. I've never been >> > happy with inflating the counting sigmas to include effects of >> > systematic error based on the consistency of redundant measurements, >> > since as I indicated above if the data are collected redundantly in >> > such a way that the systematic errors largely cancel, it implies that >> > the systematic errors should not be included in the estimate of sigma. >> > The fact that then the sigma(I)'s would generally be smaller (at >> > least for the large I's), so the sample variances would be much larger >> > than the counting variances, is irrelevant, because the former >> > includes the systematic errors. Also the I/sigma cut-off used would >> > probably not need to be changed since it affects only the weakest >> > reflections which are largely unaffected by the systematic error >> > correction. >> > >> > The second problem with mean(I/sigma) is also obvious: i.e. it's a >> > mean, and as such it's rather insensitive to the actual distribution >> > of I/sigma(I). For example if a shell contained a few highly >> > significant intensities these could be overwhelmed by a large number >> > of weak data and give an insignificant mean(I/sigma). It seems to me >> > that one should be considering the significance of individual >> > reflections, not the shell averages. Also the average will depend on >> > the width of the resolution bin, so one will get the strange effect >> > that the apparent resolution will depend on how one bins at the data! >> > The assumption being made in taking the bin average is that I/sigma(I) >> > falls off smoothly with d* but that's unlikely to be the reality. >> > >> > It seems to me that a chi-square statistic which takes into account >> > the actual distribution of I/sigma(I) would be a better bet than the >> > bin average, though it's not entirely clear how one would formulate >> > such a metric. One would have to consider subsets of the data as a >> > whole sorted by increasing d* (i.e. not in resolution bins to avoid >> > the 'bin averaging effect' described above), and apply the resolution >> > cut-off where the chi-square statistic has maximum probability. This >> > would automatically take care of incompleteness effects since all >> > unmeasured reflections would be included with I/sigma = 0 just for the >> > purposes of working out the cut-off point. I've skipped the details >> > of implementation and I've no idea how it would work in practice! >> > >> > An obvious question is: do we really need to worry about the exact >> > cut-off anyway, won't our sophisticated maximum likelihood refinement >> > programs handle the weak data correctly? Note that in theory weak >> > intensities should be handled correctly, however the problem may >> > instead lie with incorrectly estimated sigmas: these are obviously >> > much more of an issue for any software which depends critically on >> > accurate estimates of uncertainty! I did some tests where I refined >> > data for a known protein-ligand complex using the original apo model, >> > and looked at the difference density for the ligand, using data cut at >> > 2.5, 2 and 1.5 Ang where the standard metrics strongly suggested there >> > was only data to 2.5 Ang. >> > >> > I have to say that the differences were tiny, well below what I would >> > deem significant (i.e. not only the map resolutions but all the map >> > details were essentially the same), and certainly I would question >> > whether it was worth all the soul-searching on this topic over the >> > years! So it seems that the refinement programs do indeed handle weak >> > data correctly, but I guess this should hardly come as a surprise (but >> > well done to the software developers anyway!). This was actually >> > using Buster: Refmac seems to have more of a problem with scaling & >> > TLS if you include a load of high resolution junk data. However, >> > before anyone acts on this information I would _very_ strongly advise >> > them to repeat the experiment and verify the results for themselves! >> > The bottom line may be that the actual cut-off used only matters for >> > the purpose of quoting the true resolution of the map, but it doesn't >> > significantly affect the appearance of the map itself. >> > >> > Finally an effect which confounds all the quality metrics is data >> > anisotropy: ideally the cut-off surface of significance in reciprocal >> > space should perhaps be an ellipsoid, not a sphere. I know there are >> > several programs for anisotropic scaling, but I'm not aware of any >> > that apply anisotropic resolution cutoffs (or even whether this would >> > be advisable). >> > >> > Cheers >> > >> > -- Ian >> > >> > On 27 January 2012 17:47, Jacob Keller <j-kell...@fsm.northwestern.edu> >> > wrote: >> >> Dear Crystallographers, >> >> >> >> I cannot think why any of the various flavors of Rmerge/meas/pim >> >> should be used as a data cutoff and not simply I/sigma--can somebody >> >> make a good argument or point me to a good reference? My thinking is >> >> that signal:noise of >2 is definitely still signal, no matter what the >> >> R values are. Am I wrong? I was thinking also possibly the R value >> >> cutoff was a historical accident/expedient from when one tried to >> >> limit the amount of data in the face of limited computational >> >> power--true? So perhaps now, when the computers are so much more >> >> powerful, we have the luxury of including more weak data? >> >> >> >> JPK >> >> >> >> >> >> -- >> >> ******************************************* >> >> Jacob Pearson Keller >> >> Northwestern University >> >> Medical Scientist Training Program >> >> email: j-kell...@northwestern.edu >> >> ******************************************* > > > > > -- > > ARKA CHAKRABORTY > CAS in Crystallography and Biophysics > University of Madras > Chennai,India > > > ------ > Randy J. Read > Department of Haematology, University of Cambridge > Cambridge Institute for Medical Research Tel: + 44 1223 336500 > Wellcome Trust/MRC Building Fax: + 44 1223 336827 > Hills Road E-mail: rj...@cam.ac.uk > Cambridge CB2 0XY, U.K. www-structmed.cimr.cam.ac.uk > -- ******************************************* Jacob Pearson Keller Northwestern University Medical Scientist Training Program email: j-kell...@northwestern.edu *******************************************