Re: [ccp4bb] Does NCS bias a randomly-chosen test set (even if not enforced)? [ccp4bb] an over refined structure

Edward Berry Thu, 21 Feb 2008 11:58:53 -0800

Dale Tronrud wrote:


   In summary, this argument depends on two assertions that you can
argue with me about:

   1) When a parameter is being used to fit the signal it was designed
for, the resulting model develops predictive power and can lower
both the working and free R.  When a signal is perturbing the value
of a parameter for which is was not designed, it is unlikely to improve
its predictive power and the working R will tend to drop, but the free
R will not (and may rise).

   2) If the unmodeled signal in the data set is a property in real
space and has the same symmetry as the molecule in the unit cell,
the inappropriate fitting of parameters will be systematic with
respect to that symmetry and the presence of a reflection in the
working set will tend to cause its symmetry mate in the test set
to be better predicted despite the fact that this predictive power
does not extend to reflections that are unrelated by symmetry.
This "bias" will occur for any kind of "error" as long as that
"error" obeys the symmetry of the unit cell in real space.


Well, I've had time now to think about this, and I find myself
agreeing with point 1 and most of point 2 (that the unmodeled
signal has the same symmetry as the model), and I still would
argue that NCS symmetry does not bias the free set if it is
not enforced. Sorry to be so persistent, and I'm sure 95% of
readers will want to stop here, but:

I don't see that the symmetry of unmodeled signal will cause
a free reflection whose symmetry mate is working to be better
predicted than another free reflection which is not sym-related
to a working reflection, or than in the case where there is no
symmetry.  I think this assertion comes from noting that in the
final refined structure the sym-related Fc's will be correlated,
and since the Fo's are also correlated, and since the sign of
|Fo-Fc| is correlated because of the symmetry of the un-modeled
signal, this correlation of sym-related Fc's results in an artificial
reduction of |Fo-Fc| at test reflections.

That might be true of the correlation between sym-related Fc's were
perfect. I want to argue that the correlation between Fc's results
only from the approach of the test Fc to the F of the true
(symmetrical) structure, i.e. the correlation follows from, rather
than contributes to, the decrease of |Fo-Fc| at test reflections.
The decrease in |Fo-Fc| at test reflections results only from
the approach of the electron density of the model to that of the
real structure (and the fact that the Fo's are good estimates of the
diffraction pattern of the real structure), and this is exactly what
the Free-R is supposed to measure.

Let me describe the refinement process in perhaps oversimplified
steps which make my argument clear, and then we may want to argue
about the individual assumptions.

In my view it all depends on what is
driving what. Briefly, the need to minimize |Fo-Fc| at working
reflections drives the structural changes, the resulting approach
of the structure to the true structure drives the decrease in
|Fo-Fc| at free reflections, and it is only this reduction in
|Fo-Fc| at free reflections which brings about the correlation
between free Fc and sym-related working Fc. You cannot then turn
around and say the correlation between sym-related free and
working Fc biases the |Fo-Fc|Free.

To elaborate:
The individual structural changes are driven by the need to minimize
|Fo-Fc| at the working reflections. The refinement program reduces
|Fo-Fc| at working reflections by a combination of (1) appropriate
structural changes which actually make the model closer to the
true structure, (2)Inappropriate structural changes which happen
to reduce |Fo-Fc| by accounting for some of the un-modeled signal,
but not in a way that resembles the real structure, and (3) fitting
the noise in the measurements.

The reduction in |Fo-Fc| at free reflections is driven by the fact
that the changes make the model a better approximation to the
electron density of the real structure, driving the Fc closer to the
theoretical F's of the true structure, of which the Fo are a good
approximation.
This is mainly due to changes of type 1 above, appropriate modeling of
the structure. The inappropriate movement of atoms into density may
also improve free |Fo-Fc| at least at low resolution, so give a
smaller decrease in Rfree than in Rwork. And fitting the noise will
in general move the structure away from the true structure and so
tend to increase |Fo-Fc|free.  The point, which we may need to argue
about, is that the only force driving the reduction of |Fo-Fc| at free
reflections is the approach of the model electron density to that
of the true structure.

Finally, the correlation of free Fc with working Fc is driven
by the approach of both to to their respective Fo, together with the
fact that the Fo are highly correlated. The correlation can never
be better than the correlation of free Fc to Fo, which we said in the
previous step is due to improvement of the model. To the extent that
|Fo-Fc|free does not decrease much, say because of fitting noise
or inappropriate fitting of signal, so the correlation of free Fc
with working Fc will be poor and will not bias the free |Fo-Fc|.
Since the correlation of Free and Working Fc is driven by the the
approach of free Fc to Fo, it cannot augment that approach and hence
bias Rfree.

If we look after the fact and see that there is a correlation between
sym-related free and working Fc, and we want to say that that
correlation biases the Rfree, we have to ask how good is the
correlation, and where did it come from. If the correlation
results from the decrease in |Fo-Fc|free then it will be trailing
behind that decrease, so to speak, and cannot be driving it farther on.

A useful example is to make the analogy with cross-crystal restraints-
say there is no NCS but you have another structure refined from an
isomorphous crystal with different free set. Say instead of imposing NCS
restraints we restrain the new structure to be similar to the old.
(Whether or not this is a good analogy is arguable).
This will cause the Fc of the new structure to approach those of the old
structure, and since many of the free reflections were working
reflections in that refinement, they will be biased*.

Then suppose we don't apply cross-crystal restraints. Now it is very
clear that there can be no bias from the previous structure, because
it is not involved in any way in the new refinement. Nonetheless the
Fc's of the new refined structure will be highly correlated with those
of the previous structure, and noting this after the fact one might
want to argue that this results in biasing the Free-R of the new
structure. Again it does not, because the correlation between free Fc's
in the new structure and working Fc's with the same hkl in the old
structure results only from the approach of both to the theoretical F's
of the correct structure and (what is nearly the same on average) to
the Fo's. And this is exactly what the Free-R is supposed to measure.

Ed
==============
*but only because of the systematic error due to unmodeled signal:
if the error were only random error of measurement, it is equally likely
to be in the opposite direction /in the other dataset/at a sym-related
reflection/.
To test if this is the case, refine your old model against a new dataset
using the same Free-R set. If the difference between R-free and R was
due to fitting random observational error, the error will be different,
and R and Rfree will be the same initially. If it is at least partly
due to systematic error (i.e. error which is the same in all datasets)
then you will start out initially with a gap between R and Rfree.

Re: [ccp4bb] Does NCS bias a randomly-chosen test set (even if not enforced)? [ccp4bb] an over refined structure

Reply via email to