Dale Tronrud wrote:
In summary, this argument depends on two assertions that you can argue with me about: 1) When a parameter is being used to fit the signal it was designed for, the resulting model develops predictive power and can lower both the working and free R. When a signal is perturbing the value of a parameter for which is was not designed, it is unlikely to improve its predictive power and the working R will tend to drop, but the free R will not (and may rise). 2) If the unmodeled signal in the data set is a property in real space and has the same symmetry as the molecule in the unit cell, the inappropriate fitting of parameters will be systematic with respect to that symmetry and the presence of a reflection in the working set will tend to cause its symmetry mate in the test set to be better predicted despite the fact that this predictive power does not extend to reflections that are unrelated by symmetry. This "bias" will occur for any kind of "error" as long as that "error" obeys the symmetry of the unit cell in real space.
Well, I've had time now to think about this, and I find myself agreeing with point 1 and most of point 2 (that the unmodeled signal has the same symmetry as the model), and I still would argue that NCS symmetry does not bias the free set if it is not enforced. Sorry to be so persistent, and I'm sure 95% of readers will want to stop here, but: I don't see that the symmetry of unmodeled signal will cause a free reflection whose symmetry mate is working to be better predicted than another free reflection which is not sym-related to a working reflection, or than in the case where there is no symmetry. I think this assertion comes from noting that in the final refined structure the sym-related Fc's will be correlated, and since the Fo's are also correlated, and since the sign of |Fo-Fc| is correlated because of the symmetry of the un-modeled signal, this correlation of sym-related Fc's results in an artificial reduction of |Fo-Fc| at test reflections. That might be true of the correlation between sym-related Fc's were perfect. I want to argue that the correlation between Fc's results only from the approach of the test Fc to the F of the true (symmetrical) structure, i.e. the correlation follows from, rather than contributes to, the decrease of |Fo-Fc| at test reflections. The decrease in |Fo-Fc| at test reflections results only from the approach of the electron density of the model to that of the real structure (and the fact that the Fo's are good estimates of the diffraction pattern of the real structure), and this is exactly what the Free-R is supposed to measure. Let me describe the refinement process in perhaps oversimplified steps which make my argument clear, and then we may want to argue about the individual assumptions. In my view it all depends on what is driving what. Briefly, the need to minimize |Fo-Fc| at working reflections drives the structural changes, the resulting approach of the structure to the true structure drives the decrease in |Fo-Fc| at free reflections, and it is only this reduction in |Fo-Fc| at free reflections which brings about the correlation between free Fc and sym-related working Fc. You cannot then turn around and say the correlation between sym-related free and working Fc biases the |Fo-Fc|Free. To elaborate: The individual structural changes are driven by the need to minimize |Fo-Fc| at the working reflections. The refinement program reduces |Fo-Fc| at working reflections by a combination of (1) appropriate structural changes which actually make the model closer to the true structure, (2)Inappropriate structural changes which happen to reduce |Fo-Fc| by accounting for some of the un-modeled signal, but not in a way that resembles the real structure, and (3) fitting the noise in the measurements. The reduction in |Fo-Fc| at free reflections is driven by the fact that the changes make the model a better approximation to the electron density of the real structure, driving the Fc closer to the theoretical F's of the true structure, of which the Fo are a good approximation. This is mainly due to changes of type 1 above, appropriate modeling of the structure. The inappropriate movement of atoms into density may also improve free |Fo-Fc| at least at low resolution, so give a smaller decrease in Rfree than in Rwork. And fitting the noise will in general move the structure away from the true structure and so tend to increase |Fo-Fc|free. The point, which we may need to argue about, is that the only force driving the reduction of |Fo-Fc| at free reflections is the approach of the model electron density to that of the true structure. Finally, the correlation of free Fc with working Fc is driven by the approach of both to to their respective Fo, together with the fact that the Fo are highly correlated. The correlation can never be better than the correlation of free Fc to Fo, which we said in the previous step is due to improvement of the model. To the extent that |Fo-Fc|free does not decrease much, say because of fitting noise or inappropriate fitting of signal, so the correlation of free Fc with working Fc will be poor and will not bias the free |Fo-Fc|. Since the correlation of Free and Working Fc is driven by the the approach of free Fc to Fo, it cannot augment that approach and hence bias Rfree. If we look after the fact and see that there is a correlation between sym-related free and working Fc, and we want to say that that correlation biases the Rfree, we have to ask how good is the correlation, and where did it come from. If the correlation results from the decrease in |Fo-Fc|free then it will be trailing behind that decrease, so to speak, and cannot be driving it farther on. A useful example is to make the analogy with cross-crystal restraints- say there is no NCS but you have another structure refined from an isomorphous crystal with different free set. Say instead of imposing NCS restraints we restrain the new structure to be similar to the old. (Whether or not this is a good analogy is arguable). This will cause the Fc of the new structure to approach those of the old structure, and since many of the free reflections were working reflections in that refinement, they will be biased*. Then suppose we don't apply cross-crystal restraints. Now it is very clear that there can be no bias from the previous structure, because it is not involved in any way in the new refinement. Nonetheless the Fc's of the new refined structure will be highly correlated with those of the previous structure, and noting this after the fact one might want to argue that this results in biasing the Free-R of the new structure. Again it does not, because the correlation between free Fc's in the new structure and working Fc's with the same hkl in the old structure results only from the approach of both to the theoretical F's of the correct structure and (what is nearly the same on average) to the Fo's. And this is exactly what the Free-R is supposed to measure. Ed ============== *but only because of the systematic error due to unmodeled signal: if the error were only random error of measurement, it is equally likely to be in the opposite direction /in the other dataset/at a sym-related reflection/. To test if this is the case, refine your old model against a new dataset using the same Free-R set. If the difference between R-free and R was due to fitting random observational error, the error will be different, and R and Rfree will be the same initially. If it is at least partly due to systematic error (i.e. error which is the same in all datasets) then you will start out initially with a gap between R and Rfree.