Re: [ccp4bb] Does NCS bias a randomly-chosen test set (even if not enforced)?

Dirk Kostrewa Mon, 11 Feb 2008 06:08:12 -0800

Dear Ed,

although, I don't think that a comparison of refinement in a higherand a lower symmetry space group is valid for general NCS cases, Iwill try to answer your question. Here are my thoughts for twodifferent cases:

(1) You have data to atomic resolution with high I/sigma and low Rsym(I assume high redundancy). The n copies of the asymmetric unit inthe unit cell are really identical and obey the higher symmetry (so,not a protein crystal). When you process the data in lower symmetry(say, P1), the non-averaged "higher-symmetry"-equivalent Fobs willdiffer due to measurement errors, and thus reflections in the working-set will differ to "higher-symmetry"-related reflections in the test-set due to these measurement errors. If you then refine the n copiesagainst the working-set in the lower P1 symmetry, you minimize |Fobs(work)-Fcalc|, resulting in Fcalcs that become closer to the working-set Fobs. As a consequence, the Fcalcs will thus diverge somewhatfrom the test-set Fobs. However, since this atomic model is assumedto be very well defined obeying the higher symmetry, and,furthermore, the working-set contains well measured "higher-symmetry"-equivalent Fobs, the resulting atomic positions, and thus the Fcalcs,will be very close to their equivalent values in the higher-symmetryrefinement. Therefore, the Fcalcs will also be still very similar tothe "higher-symmetry"-equivalent Fobs in the test-set, and I wouldexpect a difference between Rwork and Rfree ranging from "0" to thevalue of Rsym. In other words, the Fobs in the test-set are notreally independent of the reflections in the working-set, and thusRfree is heavily biased towards Rwork.In this case, I would not expect large differences in the outcome dueto the additional application of "NCS"-constraints/restraints.

(2) You have data to non-atomic lower resolution, weak I/sigma andpoor Rsym. It is impossible to say whether the n copies of theasymmetric unit in the unit cell are really identical, but they aretreated so assuming the higher symmetry (so, a real protein crystal).For data processing, the same holds true as for case (1). Incontrast, here I think that it makes a difference, whether you apply"NCS"-constraints/restraints between the n copies in the lowersymmetry P1, or not. If you apply "NCS"-constraints or strong "NCS"-restraints, the n copies are made equal and you get n times theaverage structure. This is similar to the refinement in the highersymmetry, except that again you minimize the discrepancy betweenFcalcs and working-set Fobs, which will increase the discrepancy tothe "higher-symmetry"-related Fobs in the test-set. But since theFobs in the test-set are still not really independent to the Fobs inthe working-set, I would again expect maximum differences betweenRwork and Rfree in the same order of magnitude as Rsym. So, Rfree isstill biased towards Rwork, but it might be more difficult to noticethis. But if you do not apply "NCS"-constraints/restraints, you givethe less well-defined atomic model more freedom to converge againstthe working-set Fobs, resulting in a higher discrepancy between Rworkand Rfree. But since the Fobs in the working set still contain"higher-symmetry"-equivalent Fobs, you will end up with a model thatstill shows some similarity to the refined structure in the highersymmetry. As a result, the Rfree is even then not really independentof Rwork, but it might be even more difficult to notice this,depending on data resolution and quality. Here, I can't give a rangeof differences between Rwork and Rfree.

So, this is still not quantitative, and I hope that I'm notcompletely wrong with my argumentation.

These lower vs. higher symmetry examples given above are onlytransferable to reality in special NCS-cases with pseudo-highersymmetry (what Dale Tronrud discussed). Taking these special casesaside, what do the NCS experts say to my original statement thatprecautions against NCS bias in Rfree must only be taken if NCS-constraints/restraints are really applied during refinement?


Best regards,

Dirk.

Am 08.02.2008 um 21:43 schrieb Edward A. Berry:

Clarification-

Someone wrote:
Ah- that's going way to fast for the beginners, at least one ofthem!
Can someone explain why the R-free will be very close to the R-work,
preferably in simple concrete terms like Fo, Fc, at sym-related
reflections, and the change in the Fc resulting from a step ofrefinement?
Ed
Hi Ed,
  Here's what I think they're saying:
If the NCS is almost crystallographic, then one wedge of spotswill be almost identical to another wedge. If spot "a" is in thetest set, but the almost-crystallographically identical spot "a' "in the 2nd wedge isn't, then because you're refining directlyagainst a', spot a doesn't really count as "free".
  Was that the question?
Thanks, but,

Here we are talking about refining a structure in an artificially low
space group, to get away from the complexities of the G-function and
degree of overlap. The "NCS" brings a reflection in the test setexactly
onto a reflection in the work set. I'm asking "so what?"

Think about what you mean when you say "spot a and spot a' are
crystallographically identical".

Do you mean the Fo are identical?
They are not, because if we consider it a lower space group then
we will not average these spots, but have separate experimentally
determined values for them.  However as pointed out by Jon Wright
and Dean Madden yesterday, the difference between sym-related Fobs
is usually much smaller than the difference between Fo and Fc, so
the sym-related Fobs can be considered almost the same in comparison
to Fc. Specifically,they are likely to be both on the same side of Fc,
so changing two Fc in the same direction will have the same effect on
|Fo-Fc| at the two reflections.

Do you mean the Fc are identical? If we start with the symmetrical
structure refined in the higher space group, their initial values will
be the same. However if we do not enforce NCS, then the changesinduced
by refinement will be asymmetric, and the two "NCS-related" Fc will
start to diverge. A change which is made because it improves thefit for
some reflections in the working set may well make the fit worse for
the related reflections in the test set. The only way they are coupled
is through the fact that if a change makes the model more like thereal
structure, then the expected value of the resulting change in |Fo-Fc|
is negative for all reflections.

Remember R and Rfree will be statistically the same before refinement,
and start to diverge once refinement begins. Dirk's lesson seems
to imply they will diverge less if there is (perfect) NCS, even if
the NCS is not applied.
(I'm probably wrong, but I want someone to show me,and not withhand-waving
arguments or invocation of crystallographic intuition or such)
To convince me, someone needs to show that the expected value ofthe changein |Fo-Fc| at a test reflection upon a change in the model (a stepof refinement)
is negative, even in the absence of any real improvement in the model,
simply because the change reduces |Fo-Fc| at a sym-related working
reflection.

Ed



*******************************************************
Dirk Kostrewa
Gene Center, A 5.07
Ludwig-Maximilians-University
Feodor-Lynen-Str. 25
81377 Munich
Germany
Phone:  +49-89-2180-76845
Fax:    +49-89-2180-76999
E-mail: [EMAIL PROTECTED]
*******************************************************

Re: [ccp4bb] Does NCS bias a randomly-chosen test set (even if not enforced)?

Reply via email to