Re: [ccp4bb] Free Reflections as Percent and not a Number

Dirk Kostrewa Tue, 25 Nov 2014 09:31:49 -0800

Dear CCP4ers,

I can only second Ian's observation: also in my experience, it issufficient to run enough cycles of conventional refinement to get rid ofany previous bias after changing the free set of reflections, oromitting parts of the structure.I don't have the results anymore, but if my memory serves me correctly,I once compared simulated-annealing with many cycles of conventionalrefinement with X-PLOR after removal of 5% of the model and monitoredthe differences in Rfree-R, both converging to very similar values (theabsolute values for R and Rfree were worse for simulated-annealing).

I also think, that neither simulated-annealing nor jiggling is necessaryto get rid of previous bias.


Best regards,

Dirk.

On 25.11.2014 17:03, Ian Tickle wrote:

Dear All
I'd like to raise the question again of whether any of this 'jiggling'(i.e. addition of random noise to the co-ordinates) is reallynecessary anyway, notwithstanding Dale's valid point that even if itwere necessary, jiggling in its present incarnation is unlikely towork because it's unlikely to erase the influence of low res. reflexions.
My claim is that jiggling is completely unnecessary, because Imaintain that refinement to convergence is alI that is required toremove the bias when an alternate test set is selected. In fact Iclaim that it's the refinement, not the jiggling, that's whollyresponsible for removing the bias. I know we thrashed this out awhile back and I recall that the discussion ended with a challenge tome to prove my claim that the refine-only Rfrees are indeed unbiased.I couldn't see an easy way of doing this which didn't involverebuilding and re-refining the same structure 20 times over, withoutintroducing any observer bias.
The present discussion prompted me to think again about this and Ibelieve I can prove part of my claim quite easily, that jiggling hasno effect on the results. Proving that the resulting Rfrees areunbiased is much harder, since as we've seen there's no proof thatjiggling actually removes the bias as claimed by its proponents.However given that said proponents of jiggling+refinement have beenhappy to accept for many years that their results are unbiased, thenthey must be equally happy now to accept that the refinement-onlyresults are also unbiased, provided I can demonstrate that thedifference between the results is insignificant.
The experimental proof rests on comparison between the Rfrees andRMSDs of the jiggled+refined and the refined-only structures for the19 possible alternate test sets (assuming 5% test-set size). Ifjiggling makes no difference as I claim then there should be nosignificant difference between the Rfrees and insignificant RMSDs forall pairs of alternate test sets.
However, first we must be careful to establish what is a suitablevalue for the noise magnitude to add to the co-ordinates. If it's toosmall it won't remove the bias (again notwithstanding Dale's pointthat it's unlikely to have any effect anyway on the low res. data);too large and you push it beyond the convergence radius of therefinement and end up damaging the structure irretrievably (at leastunless you're prepared to do significant rebuilding of the model).
For the record here's the crystal info for the test data I selected:

Nres: 96   SG: P41212   Vm: 1.99   Solvent: 0.377
Resol: 40-1.58 A.
Working set size: 11563   Test set size: 611 (5%)   Test set: 0
Refinement program:     BUSTER.
Noise addition program: PDBSET.
It's wise to choose a small protein since you need to run lots ofrefinements! However feel free to try the same thing with your own data.
First I took care that the starting model was refined to convergenceusing the original test set 0, and I performed 2 sequential runs ofrefinement with BUSTER (the deviations are relative to the inputco-ordinates in each case):
Ncyc  Rwork   Rfree   RMSD MaxDev
  82     0.181  0.230     0.005   0.072
  51     0.181  0.231     0.002   0.015
The advantage of using BUSTER is that it has its own convergence test;with REFMAC you have to guess.
Then I tried a range of input noise values (0.20, 0.25. 0.30, 0.35,0.40, 0.50 A) on the refined starting model. Note that these areRMSDs, not maximum shifts as claimed by the PDBSET documentation. Ineach case I did 4 sequential runs of BUSTER on the jiggledco-ordinates and by looking at the RMSDs and max. shifts I decidedthat 0.25 A RMSD was all the structure could stand without riskingpermanent damage (note that the default noise value in PDBSET is 0.2):
Initial RMSD: 0.248  MaxDev: 0.407

Ncyc  Rwork   Rfree   RMSD  MaxDev
 358    0.183   0.230    0.052    0.454
 126    0.181   0.232    0.041    0.383
   65    0.181   0.232    0.040    0.368
   50    0.181   0.232    0.040    0.360
The only purpose of the above refinements is to establish the mostsuitable noise value; the resulting refined PDB files were not used.
So then I took the co-ordinates with 0.25 A noise added and for eachtest set 1-19 did 2 sequential runs of BUSTER.
Finally I took the original refined starting model (i.e. without noiseaddition) and again refined to convergence using all 19 alternate testsets.
The results are attached. The correlation coefficient between the 2sets of Rfrees is 0.992 and the mean RMSD between the sets is 0.04 A,so the difference between the 2 sets is indeed insignificant.
I don't find this result surprising at all: provided the jigglingkeeps the structure inside the convergence radius of refinement, thenby definition the refinement will produce the same result irrespectiveof the starting point (i.e. jiggled or not). If the jiggling takesthe structure outside the radius of convergence then the originalstructure will not be retrievable without manual rebuilding: I'massuming that's not the goal here.
I suspect that the idea of jiggling may have come about becauserefinements have not always been carried through to convergence:clearly if you don't do a proper job of refinement then you mustexpect some of the original bias to remain. Also to head off thesuggestion that simulated annealing refinement would fix this I wouldsuggest that any kind of SA refinement is only of value for initial MRmodels when there may be significant systematic error in the model;it's not generally advisable to perform it on final refined models(jiggled or not) when there is no such systematic error present.
Cheers

-- Ian
On 21 November 2014 18:56, Dale Tronrud <de...@daletronrud.com<mailto:de...@daletronrud.com>> wrote:
    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1



    On 11/21/2014 12:35 AM, "F.Xavier Gomis-Rüth" wrote:
    > <snip...>
    >
    > As to the convenience of carrying over a test set to another
    > dataset, Eleanor made a suggestion to circumvent this necessity
    > some time ago: pass your coordinates through pdbset and add some
    > noise before refinement:
    >
    > pdbset xyzin xx.pdb xyzout yy.pdb <<eof noise 0.4 eof
    >

       I've heard this "debiasing" procedure proposed before, but I've
    never seen a proper test showing that it works.  I'm concerned that
    this will not erase the influence of low resolution reflections that
    were in the old working set but are now in the new test set.  While
    adding 0.4 A gaussian noise to a model would cause large changes to
    the 2 A structure factors I doubt it would do much to those at 10 A.

       It seems to me that one would have to have random, but correlated,
    shifts in atomic parameters to affect the low resolution data - waves
    of displacements, sometimes to the left and other times to the right.
     You would need, of course, a superposition of such waves that span
    all the scales of resolution in the data set.

       Has anyone looked at the pdbset jiggling results and shown that the
    low resolution data are scrambled?

    Dale Tronrud

    > Xavier
    >
    > On 20/11/14 11:43 PM, Keller, Jacob wrote:
    >> Dear Crystallographers,
    >>
    >> I thought that for reliable values for Rfree, one needs only to
    >> satisfy counting statistics, and therefore using at most a couple
    >> thousand reflections should always be sufficient. Almost always,
    >> however, some seemingly-arbitrary percentage of reflections is
    >> used, say 5%. Is there any rationale for using a percentage
    >> rather than some absolute number like 1000?
    >>
    >> All the best,
    >>
    >> Jacob
    >>
    >> ******************************************* Jacob Pearson Keller,
    >> PhD Looger Lab/HHMI Janelia Research Campus 19700 Helix Dr,
    >> Ashburn, VA 20147 email: kell...@janelia.hhmi.org
    <mailto:kell...@janelia.hhmi.org>
    >> ******************************************* .
    >>
    >
    > --
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v2.0.22 (MingW32)

    iEYEARECAAYFAlRviu4ACgkQU5C0gGfAG12TMwCfTT0Q4yfCCOxJlRXtsCXmmp1n
    9lEAn2Ir57+Y16fh02VcsvDxwu6KYRGK
    =68gK
    -----END PGP SIGNATURE-----


--

*******************************************************
Dirk Kostrewa
Gene Center Munich, A5.07
Department of Biochemistry
Ludwig-Maximilians-Universität München
Feodor-Lynen-Str. 25
D-81377 Munich
Germany
Phone:  +49-89-2180-76845
Fax:    +49-89-2180-76999
E-mail: kostr...@genzentrum.lmu.de
WWW:    www.genzentrum.lmu.de
*******************************************************

Re: [ccp4bb] Free Reflections as Percent and not a Number

Reply via email to