Re: [ccp4bb] Free Reflections as Percent and not a Number

Ian Tickle Wed, 26 Nov 2014 06:53:31 -0800

Oops sorry PDBSET allows input RMSD up to 0.5 A.

Cheers


-- Ian

On 26 November 2014 at 14:36, Ian Tickle <ianj...@gmail.com> wrote:

>
> Hi Jose
>
> I think the counter-argument to that is that there are many more
> unrestrained than restrained interatomic distances (even if you include
> bond & torsion angles, planes & VDW contacts), so there are plenty of
> opportunities for the atoms to move apart by more than 2*RMSD.  In fact if
> I use RMSD = 0.4 (the max that PDBSET allows is 0.4) I see deviations up to
> 3.1 A between the refined starting model and the refined jiggled model (so
> anyone planning on manual rebuilding has their work cut out!).  So the
> geometric restraints certainly don't guarantee that the atoms will come
> back to within 0.05 A of their starting positions, even for starting RMSD =
> 0.2 A.
>
> Here are my full results showing: RMSD & MaxDev of original jiggled model
> from the starting model, and RMSD & MaxDev of refined jiggled model from
> starting model.
>
>       0.201   0.329      0.040   0.529
>       0.248   0.407      0.040   0.360
>       0.298   0.495      0.051   0.852
>       0.350   0.589      0.103   1.371
>       0.401   0.657      0.257   3.132
>       0.500   0.810      0.282   2.717
>
> Note that the refined RMSDs and MaxDev values increase dramatically beyond
> starting RMSD = 0.25, which is why I selected 0.25 as my noise value.  Also
> this was the only RMSD for which the MaxDev actually decreased after
> refinement.
>
> Cheers
>
> -- Ian
>
>
> On 26 November 2014 at 10:04, Seijo, Jose A. Cuesta <
> josea.cuesta.se...@carlsberglab.dk> wrote:
>
>> Hi all,
>>
>> I'd like to challenge the notion that a "jiggle" to an RMSD of 0.2Å will
>> actually move your atoms by anywhere close to 0.2Å, hence affecting at
>> least the reflections at 2Å.
>> Well, it will, but think of what happens to two atoms that were at their
>> ideal distance then then are 0.4Å further apart. The distortions to the
>> ideal bond distances and angles will be overwhelmingly the driving factor
>> during refinement, at least at first, and within the first couple of cycles
>> in the new refinement, ideal geometry will have been largely restored.
>> Since it is very unlikely for any atom that all its environment had moved
>> in the same direction during the jiggle, and each atom is linked to at
>> least 4 other atoms after counting the bond angles. The geometry restraints
>> will make sure that most atoms come back to within 0.05Å of their initial
>> position. Only then will the refinement start to be dominated by the fit to
>> the structure factors (IMHO).
>>
>> Cheers,
>>
>> Jose.
>>
>> ================================
>> Jose Antonio Cuesta-Seijo, PhD
>> Carlsberg Laboratory
>> Gamle Carlsberg Vej 10
>> DK-1799 Copenhagen V
>> Denmark
>>
>> Tlf +45 3327 5332
>> Email josea.cuesta.se...@carlsberglab.dk
>> ================================
>>
>>
>> -----Original Message-----
>> From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of
>> Tim Gruene
>> Sent: Tuesday, November 25, 2014 7:41 PM
>> To: CCP4BB@JISCMAIL.AC.UK
>> Subject: Re: [ccp4bb] Free Reflections as Percent and not a Number
>>
>> Hi Ed,
>>
>> it is an easy excercise to show that theory (according to "by
>> definition") and reality greatly diverge - refinement is too complex to
>> get back to exactly the same structure. Maybe because one often does not
>> reach convergence, no matter how  many cycles of refinement you run.
>>
>> Best,
>> Tim
>>
>> On 11/25/2014 07:29 PM, Edward A. Berry wrote:
>> >> provided the jiggling keeps the structure inside the convergence
>> >> radius of refinement, then by definition the refinement will produce
>> >> the same result irrespective of the starting point (i.e. jiggled or
>> >> not).  If the jiggling takes the structure outside the radius of
>> >> convergence then the original structure will not be retrievable
>> >> without manual rebuilding: I'm assuming that's not the goal here.
>> >
>> >
>> > I actually agree with this, but an R-free purist might argue that you
>> > have to get outside of radius of convergence to eliminate R-free bias.
>> > Otherwise, by definition, "you will just refine back to the same old
>> > biased structure!".
>> >   (but you have shown that the conventional .2A rms is within radius
>> > of
>> > convergence)
>> >
>> > In fact Dale's concern about low-res reflections could be put in terms
>> > of radius of convergence and false minima.
>> > Moving a lot of atoms by .2 A will have a significant effect on the
>> > phase of a 2A reflection, but almost no effect on a 20A reflection.
>> > Say you have refined against all the low resolution reflections, and
>> > got a structure that fits better than it should because it is fitting
>> > the noise in the free reflections. Now take away the free reflections
>> > and continue to refine. It will drop into the nearest local minimum,
>> > which since it is near the solution with all reflections, will still
>> > give artificially low R-free.  Jiggling by 0.2 A will have no effect
>> > because the local minima are are extremely broad and shallow, as far
>> > as the low-res reflections go.
>> >
>> > But then you could say that since any local minima are so broad, all
>> > structures that are even slightly reasonable, (including the correct
>> > one) will be within radius of convergence of the same minimum as far
>> > as the low-res reflections are concerned. The nearest false minimum
>> > involves moving atoms by 5-10 A, so within reason the convergence
>> > point will be completely independent of the starting structure.
>> > Presumably this is why Phenix rigid body refinement starts out at
>> > ultra-low
>> > resolution: to increase the radius of convergence. From that
>> > perspective, rather than being the worrisome part, the low-resolution
>> > is the region where we can assume Ian's assumption is correct.
>> >
>> > What about another experiment, which I think we've discussed before.
>> > Take a structure refined to convergence with a pristine free set. Now
>> > refine to convergence against all the data. The purist will say that
>> > the free set is hopelessly corrupted. And sure enough when we take
>> > that structure and calculate free-R with the original set, R-free is
>> > same as R-work within statistical significance.  But- I guess adding
>> > the extra 5% reflections will not change any atomic position by more
>> > than 0.2 A (maybe 0.02A), and so we are still well within radius of
>> > convergence of the original unbiased structure. Refining against the
>> > original working set will give back that unbiased structure, and Rfree
>> > will return to it original value.
>> >
>> > This suggest, if the only purpose of Rfree is to get a number to
>> > deposit with the pdb (which it is not), you should first solve your
>> > structure using all the data, fitting the noise; then exclude a free
>> > set and back off on fitting the noise of it to get the R-free.  The
>> > only problem would be that during the refinement without guidance of
>> > R-free, you may have engaged in some practice that hurt the structure
>> > so much that it ends up out of RoC of the well-refined structure. Not
>> > because you were fitting the noise (anyway you are fitting the noise
>> > in your 95% working
>> > set) but because you would not have been warned that some procedure
>> > was not helping.
>> >
>> > Very provocative discussion!
>> > eab
>> >
>> >
>> > On 11/25/2014 11:03 AM, Ian Tickle wrote:
>> >> Dear All
>> >>
>> >> I'd like to raise the question again of whether any of this 'jiggling'
>> >> (i.e. addition of random noise to the co-ordinates) is really
>> >> necessary anyway, notwithstanding Dale's valid point that even if it
>> >> were necessary, jiggling in its present incarnation is unlikely to
>> >> work because it's unlikely to erase the influence of low res.
>> reflexions.
>> >>
>> >> My claim is that jiggling is completely unnecessary, because I
>> >> maintain that refinement to convergence is alI that is required to
>> >> remove the bias when an alternate test set is selected.  In fact I
>> >> claim that it's the refinement, not the jiggling, that's wholly
>> >> responsible for removing the bias.  I know we thrashed this out a
>> >> while back and I recall that the discussion ended with a challenge to
>> >> me to prove my claim that the refine-only Rfrees are indeed unbiased.
>> >> I couldn't see an easy way of doing this which didn't involve
>> >> rebuilding and re-refining the same structure 20 times over, without
>> >> introducing any observer bias.
>> >>
>> >> The present discussion prompted me to think again about this and I
>> >> believe I can prove part of my claim quite easily, that jiggling has
>> >> no effect on the results.  Proving that the resulting Rfrees are
>> >> unbiased is much harder, since as we've seen there's no proof that
>> >> jiggling actually removes the bias as claimed by its proponents.
>> >> However given that said proponents of jiggling+refinement have been
>> >> happy to accept for many years that their results are unbiased, then
>> >> they must be equally happy now to accept that the refinement-only
>> >> results are also unbiased, provided I can demonstrate that the
>> >> difference between the results is insignificant.
>> >>
>> >> The experimental proof rests on comparison between the Rfrees and
>> >> RMSDs of the jiggled+refined and the refined-only structures for the
>> >> 19 possible alternate test sets (assuming 5% test-set size).  If
>> >> jiggling makes no difference as I claim then there should be no
>> >> significant difference between the Rfrees and insignificant RMSDs for
>> >> all pairs of alternate test sets.
>> >>
>> >> However, first we must be careful to establish what is a suitable
>> >> value for the noise magnitude to add to the co-ordinates.  If it's
>> >> too small it won't remove the bias (again notwithstanding Dale's
>> >> point that it's unlikely to have any effect anyway on the low res.
>> >> data); too large and you push it beyond the convergence radius of the
>> >> refinement and end up damaging the structure irretrievably (at least
>> >> unless you're prepared to do significant rebuilding of the model).
>> >>
>> >> For the record here's the crystal info for the test data I selected:
>> >>
>> >> Nres: 96   SG: P41212   Vm: 1.99   Solvent: 0.377
>> >> Resol: 40-1.58 A.
>> >> Working set size: 11563   Test set size: 611 (5%)   Test set: 0
>> >> Refinement program:     BUSTER.
>> >> Noise addition program: PDBSET.
>> >>
>> >> It's wise to choose a small protein since you need to run lots of
>> >> refinements!  However feel free to try the same thing with your own
>> data.
>> >>
>> >> First I took care that the starting model was refined to convergence
>> >> using the original test set 0, and I performed 2 sequential runs of
>> >> refinement with BUSTER (the deviations are relative to the input
>> >> co-ordinates in each case):
>> >>
>> >> Ncyc  Rwork   Rfree   RMSD MaxDev
>> >>    82     0.181  0.230     0.005   0.072
>> >>    51     0.181  0.231     0.002   0.015
>> >>
>> >> The advantage of using BUSTER is that it has its own convergence
>> >> test; with REFMAC you have to guess.
>> >>
>> >> Then I tried a range of input noise values (0.20, 0.25. 0.30, 0.35,
>> >> 0.40, 0.50 A) on the refined starting model.  Note that these are
>> >> RMSDs, not maximum shifts as claimed by the PDBSET documentation.  In
>> >> each case I did 4 sequential runs of BUSTER on the jiggled
>> >> co-ordinates and by looking at the RMSDs and max. shifts I decided
>> >> that 0.25 A RMSD was all the structure could stand without risking
>> >> permanent damage (note that the default noise value in PDBSET is 0.2):
>> >>
>> >> Initial RMSD: 0.248  MaxDev: 0.407
>> >>
>> >> Ncyc  Rwork   Rfree   RMSD  MaxDev
>> >>   358    0.183   0.230    0.052    0.454
>> >>   126    0.181   0.232    0.041    0.383
>> >>     65    0.181   0.232    0.040    0.368
>> >>     50    0.181   0.232    0.040    0.360
>> >>
>> >> The only purpose of the above refinements is to establish the most
>> >> suitable noise value; the resulting refined PDB files were not used.
>> >>
>> >> So then I took the co-ordinates with 0.25 A noise added and for each
>> >> test set 1-19 did 2 sequential runs of BUSTER.
>> >>
>> >> Finally I took the original refined starting model (i.e. without
>> >> noise
>> >> addition) and again refined to convergence using all 19 alternate
>> >> test sets.
>> >>
>> >> The results are attached.  The correlation coefficient between the 2
>> >> sets of Rfrees is 0.992 and the mean RMSD between the sets is 0.04 A,
>> >> so the difference between the 2 sets is indeed insignificant.
>> >>
>> >> I don't find this result surprising at all: provided the jiggling
>> >> keeps the structure inside the convergence radius of refinement, then
>> >> by definition the refinement will produce the same result
>> >> irrespective of the starting point (i.e. jiggled or not).  If the
>> >> jiggling takes the structure outside the radius of convergence then
>> >> the original structure will not be retrievable without manual
>> >> rebuilding: I'm assuming that's not the goal here.
>> >>
>> >> I suspect that the idea of jiggling may have come about because
>> >> refinements have not always been carried through to convergence:
>> >> clearly if you don't do a proper job of refinement then you must
>> >> expect some of the original bias to remain.  Also to head off the
>> >> suggestion that simulated annealing refinement would fix this I would
>> >> suggest that any kind of SA refinement is only of value for initial
>> >> MR models when there may be significant systematic error in the
>> >> model; it's not generally advisable to perform it on final refined
>> >> models (jiggled or not) when there is no such systematic error present.
>> >>
>> >> Cheers
>> >>
>> >> -- Ian
>> >>
>> >>
>> >> On 21 November 2014 18:56, Dale Tronrud <de...@daletronrud.com
>> >> <mailto:de...@daletronrud.com>> wrote:
>> >>
>> >
>> >
>> > On 11/21/2014 12:35 AM, "F.Xavier Gomis-Rüth" wrote:
>> >  > <snip...>
>> >
>> >> As to the convenience of carrying over a test set to another dataset,
>> >> Eleanor made a suggestion to circumvent this necessity some time ago:
>> >> pass your coordinates through pdbset and add some noise before
>> >> refinement:
>> >
>> >> pdbset xyzin xx.pdb xyzout yy.pdb <<eof noise 0.4 eof
>> >
>> >
>> >     I've heard this "debiasing" procedure proposed before, but I've
>> > never seen a proper test showing that it works.  I'm concerned that
>> > this will not erase the influence of low resolution reflections that
>> > were in the old working set but are now in the new test set.  While
>> > adding 0.4 A gaussian noise to a model would cause large changes to
>> > the 2 A structure factors I doubt it would do much to those at 10 A.
>> >
>> >     It seems to me that one would have to have random, but
>> >>> correlated,
>> > shifts in atomic parameters to affect the low resolution data - waves
>> > of displacements, sometimes to the left and other times to the right.
>> >   You would need, of course, a superposition of such waves that span
>> > all the scales of resolution in the data set.
>> >
>> >     Has anyone looked at the pdbset jiggling results and shown
>> >>> that the
>> > low resolution data are scrambled?
>> >
>> > Dale Tronrud
>> >
>> >> Xavier
>> >
>> >> On 20/11/14 11:43 PM, Keller, Jacob wrote:
>> >>> Dear Crystallographers,
>> >
>> >>> I thought that for reliable values for Rfree, one needs only to
>> >>> satisfy counting statistics, and therefore using at most a couple
>> >>> thousand reflections should always be sufficient. Almost always,
>> >>> however, some seemingly-arbitrary percentage of reflections is used,
>> >>> say 5%. Is there any rationale for using a percentage rather than
>> >>> some absolute number like 1000?
>> >
>> >>> All the best,
>> >
>> >>> Jacob
>> >
>> >>> ******************************************* Jacob Pearson Keller,
>> >>> PhD Looger Lab/HHMI Janelia Research Campus 19700 Helix Dr, Ashburn,
>> >>> VA 20147 email:kell...@janelia.hhmi.org
>> >>> <mailto:kell...@janelia.hhmi.org>
>> >>> ******************************************* .
>> >
>> >
>> >> --
>> >>
>> >>
>> >
>>
>> --
>> Dr Tim Gruene
>> Institut fuer anorganische Chemie
>> Tammannstr. 4
>> D-37077 Goettingen
>>
>> GPG Key ID = A46BEE1A
>>
>>
>

Re: [ccp4bb] Free Reflections as Percent and not a Number

Reply via email to