On 5 Jun 2019, at 13:58, Ian Tickle <ianj...@gmail.com
<mailto:ianj...@gmail.com>> wrote:
Hi Jon
Sorry I didn't intend for my response to be interpreted as saying that anyone
has suggested directly that the measurement errors of NCS-related reflection
amplitudes are correlated. In fact the opposite is almost certainly true since
the only obvious way in practice that errors in Fobs could be correlated is via
errors in the batch scale factors which would introduce correlations between
errors in Fobs for reflections in the same or adjacent images, but that has
nothing to do with NCS. That's the 'elephant in the room': no-one has
suggested that reflections on the same or adjacent images should not be split
between the working and test sets, yet that's easily the biggest contributor to
CV bias with or without NCS! I think taking that effect into account would be
much more productive than worrying about NCS, but performing the test-set
sampling in shells can't possibly address that, since the images obviously cut
across all shells.
The point I was making was that correlation of errors in NCS-related Fobs would
appear to be the inevitable _implication_ of what certainly has been claimed,
namely that NCS can introduce bias into CV statistics if the test-set sampling
is not done correctly, i.e. by splitting NCS-related Fobs between the working
and test sets. Unless there's something I've missed that's the only possible
explanation for that claim. This is because overfitting results from fitting
the model to the errors in Fobs, and the CV bias arises from correlation of
those errors if the NCS-related Fobs are split up, thus causing the degree of
overfitting to be underestimated and giving a too-rosy picture of the structure
quality. Indeed you seem to be saying that because the NCS-related Fobs are
correlated (a patently true statement), then it follows that the errors in
those Fobs are also correlated, or at least no more correlated than for
non-NCS-related Fobs, but I just don't see how that can be
true.
Rfree is not unbiased: as a measure of the agreement it is biased upwards by
overfitting (otherwise how could it be used to detect overfitting?), by failing
to fit with the uncorrelated errors in the test-set Fobs, just as Rwork is
biased downwards by fitting to the errors in the working-set Fobs. Overfitting
becomes immediately apparent whenever you perform any refinement, so the only
point at which there is no overfitting is for the initial model when Rwork and
Rfree are equal, apart from a small difference arising from random sampling of
the test-set (that sampling error could be reduced by performing refinements
with all 20 working/test sets combinations and averaging the R values). From
there on the 'gap' between Rwork and Rfree is a measure of the degree of
overfitting, so we should really be taking some average of Rwork and Rfree as
the true measure of agreement (though the biases are not exactly equal and
opposite so it's not a simple arithmetic mean). The goal
of choosing the appropriate refinement parameters, restraints and weights is to
_minimise_ overfitting, not eliminate it. It is not possible to eliminate it
completely: if it were then Rwork and Rfree would become equal (apart from that
small effect from random sampling).
I don't follow your argument about correlation of Fobs from NCS. Overfitting, and
therefore CV bias, arises from the _errors_ in the Fobs not from the Fobs themselves, and
there's no reason to believe that the Fobs should be correlated with their errors. You
say "any correlation between the test-set and the working-set F's due to NCS would
be expected to reduce R-free". If the working and test sets are correlated by NCS
that would mean that Rwork is correlated with Rfree so they would be reduced equally!
There are two components of the Fobs - Fcalc difference: Fcalc - Ftrue (the model error)
and Fobs - Ftrue (the data error). The former is completely correlated between the
working and test sets (obviously since it's the same model) so what you do to one you
must do to the other. The latter can only be correlated by NCS if NCS has an effect on
errors in the Fobs, which it doesn't, or by some other effect such as errors in batch
scales that are unrelated to NCS.
Overfitting is related to the data/parameter ratio so you don't observe the
effects of overfitting until you change the model, the parameter set or the
restraints. If there were no errors there would be no overfitting and no CV
bias (actually there would be no need for cross-validation!).
Of course as you say, your tests suggest that there is no CV bias from NCS, in
which case there's absolutely nothing to explain!
Cheers
-- Ian
On Tue, 4 Jun 2019 at 21:33, Jonathan Cooper
<00000c2488af9525-dmarc-requ...@jiscmail.ac.uk
<mailto:00000c2488af9525-dmarc-requ...@jiscmail.ac.uk>> wrote:
Ian, statistics is not my forte, but I don't think anyone is suggesting
that the measurement errors of NCS-related reflection amplitudes are
correlated. In simple terms, since NCS-related F's should be correlated, the
working-set reflection amplitudes could be correlated with those in the
test-set, if the latter is chosen randomly, rather than in shells. Am I right
in saying that R-free not just indicates over-fitting but, also, acts as an
unbiased measure of the agreement between Fo and Fc? During a well-behaved
refinement run, in the cycles before any over-fitting becomes apparent, the
decrease in R-free value will indicate that the changes being made to the model
are making it more consistent with Fo's. In these stages, any correlation
between the test-set and the working-set F's due to NCS would be expected to
affect the R-free (cross-validation bias), making it lower than it would be if
the test set had been chosen in resolution shells? However, you are always
right and, as you know, I failed to detect any such effect in my limited
tests. Thanks to you and others for replying.
On Tuesday, 4 June 2019, 02:07:10 BST, Edward A. Berry <ber...@upstate.edu
<mailto:ber...@upstate.edu>> wrote:
On 05/19/2019 08:21 AM, Ian Tickle wrote:
~~~
>> So there you have it: what matters is that the _errors_ in the
NCS-related amplitudes are uncorrelated, or at least no more correlated than the
errors in the non-NCS-related amplitudes, NOT the amplitudes themselves.
Thanks, Ian!
I would like to think that it is the errors in Fobs that matter (as may be
the case), because then:
1. ncs would not bias R-free even if you _do_ use ncs
constraints/restraints. (changes in Fcalc due to a step of refinement would be
positively correlated between sym-mates, but if the sign of (Fo-Fc) is opposite
at the sym-mate, what impoves the working reflection would worsen the free)
2. There would be no need to use the same free set when you refine the
structure against a new dataset (as for ligand studies) since the random errors
of measurement in Fobs in the two sets would be unrelated.
However when I suggested that in a previous post, I was reminded that errors in Fobs
account for only a small part of the difference (Fo-Fc). The remainder must be due to
inability of our simple atomic models to represent the actual electron density, or its
diffraction; and for a symmetric structure and a symmetric model, that difference is
likely to be symmetric. Whether that difference represents "noise" that we
want to avoid fitting is another question, but it is likely that (Fo-Fc) will be
correlated with sym-mates. So I settled for convincing myself that the changes in Fc
brought about by refinement would be uncorrelated, and thus the _changes_ in (Fo-Fc) at
each step would be uncorrelated.
Below are some of the ideas I come up with in trying to think about this,
and about bias in general. (Not very well organized and not the best of prose,
but if one is a glutton for punishment, or just wants to see how the mind of a
madman works . . .)
Warning- some of this is contrary to current consensus opinion and the
conclusions may be, in the words of a popular autobuilding program, partly
WRONG! In particular, the idea that coupling by the G-function does not bias
R-free, but rather is the only reason that R-free works at all!
- - - - - - - - - -
The differences (Fo-Fc) can be divided between (1) errors in measurement
of reflection intensities and (2)failure of the model to represent the
true structure. The first can be considered "noise" and we would expect
it to be random, with no correlation between symm mates.
However most of the difference between Fc and Fobs is not due to random
noise in the data, but to failures of our model to accurately represent
the real thing. These differences are likely to be ncs-symmetric.
Leaving aside the question of whether or not we want to fit this kind of
"noise" (bringing the model closer to the real structure?), we conclude
that (Fo-Fc) is likely to be correlated between ncs-mates.
But for refinement against the working set to bias the contribution of
sym-related free-set reflections to R-free would require that _changes_
in |Fo-Fc| from a step of refinement would be ncs-correlated. If on the
contrary they are not correlated, i.e. if a change that decreases
|Fo-Fc| for a working reflection is equally likely to decrease or
increase |Fo-Fc| for its sym mate (which may be) in the free set, then
it is hard to see how refinement against the working reflection would
bias R-free.
Under what conditins would |Fo-Fc| for symmetry related reflections be
correlated? This would be the case if change in Fc correlates AND the
sign of (Fo-Fc) correlates. Again, if the difference were only due to
random error in Fobs, then the sign of Fo-Fc of a symmetry related
reflection
would be as likely to be the opposite as the same (as the original
reflection) so even if changes in Fc are correlated, what improves the
fit to the original reflection would be as likely to worsen the fit to
its mate. But we concluded above that Fo-Fc is likely to be correlated
by symmetry, since the shortcomings of our model are likely to be
symmetric. So we ask if changes in Fc are correlated.
So why should a structural change result in correlated changes of
symm-related Fc's?
The Fc is the amplitude of the best-fit sin wave (of the specified
frequency) to the projection of the density of the crystal onto the
specified scattering vector. The refinement program can increase Fcalc
by moving an atom so that its projection on the scattering vector moves
toward a peak of that sine wave, or decrease it by moving away from a peak.
If the projection of an atom on the scattering vector moves toward a
peak, the density becomes more peaked and the amplitude increases, if it
moves toward a trough it tends to take density away from the peak or
fill in the trough and the density becomes flatter.
But the scattering vector of a sym-related reflection is at a different
angle, anywhere from almost 0 to 90 degrees from its mate (actually to
180*, but then the Friedel mate is close to zero- Its a question of how
parallel they are, irrespective of direction). The atom we are changing
will fall at a different position along the rotated scattering vector,
and its movement may be toward a peak or trough of the projected density
on that scattering vector.
If the two reflections are close in reciprocal space, their scattering
vectors will be nearly colinear, the projection of density onto them
will be similar, and the projection of the atom being moved onto them
will come at a similar position in these projections. In that case
moving density so that its projection on one scattering vector moves
toward or away from a peak of its best-fit sine wave will have a similar
effect for the adjacent reflection, and their changes will be correlated.
But if the reflections are not close in reciprocal space, their
scattering vectors are at different angles, the projection of the
density on them looks quite different, and the projection of the atom
being moved comes at a different position. In this case it is impossible
to predict how changes in the two reflections' amplitudes due to
movement of an atom will correlate without knowing the details of the
density.
For symmetry-related reflections, the projection of density of the
rotated protomer on the scattering vector of the rotated reflection will
be the same as the projection of the density of the original protomer on
the original reflection (hence the correlation of Fc). (in case the
symmetry is actually crystallographic, as in our case, then the
projection of the entire crystal on the rotated scattering vector will
be the same as its projection on the original reflection's scattering
vector). But the change we are making is only in the original protomer,
not in its symm mate, and so its projection will fall at a different
point along the rotated scattering vector, so whether it moves density
toward a peak or trough is somewhat random.
If ncs is restrained or constrained, the changes will
also follow ncs-symmetry and so changes in Fc would be expected to be
symmetric.
I have extensive experiments, again with the same 2CHR structure
refining with I4 symmetry, showing that when you introduce a change in
the structure by random shaking or molecular dynamics, the correlation
between changes in Fc for "ncs" symmetry related atoms is close to zero,
and occasionally negative. The slight positive average correlation may be
attributed to sym-pairs that are close in reciprocal space (like 1,0,30
and -1,0,30 if there were a 2-fold along 0,0,l) so that they are coupled
not by ncs but by the G-function. Granted changes due to shaking might
not be the same as changes due to refinement, but these were shaken
starting from the refined position, and I assume that if they were refined
from this randomly shaken position they would go back to the original
refined position, in which case the Fc changes due to refinement would
be equally uncorrelated.
----------
Coupling between reflections by the G function-
Without saying exactly what is meant by couplings, reflections can be
coupled in two ways. One, reflections are coupled to other reflections near
them in reciprocal space. This is due to the fact that the molecular
transform of the molecule is relatively smooth (due to the molecular
transform being oversampled due to the asymmetric unit being larger than the
structure contained?), so values of amplitude and
phase for a reflection cannot differ too widely from those of neighboring
reflections. Or because the scattering vectors of neighboring reflections
are nearly parallel and similar in frequency so the projection of the density
on them integrates similarly.
(second is ncs-coupling)
In general coupling of neighboring reflns is a good thing for crystallography. No one
reflection is indispensable, because its information is much the same as the other reflections in a
cube of 26 surrounding reflections. This allows us to solve structures when the data is only 80-90%
complete, provided the missing reflections are randomly scattered among the present reflections. It
supports the "fill-in" fft map procedure where FcΦc is used for missing reflections (the
structure based on surrounding reflectins will be good enough to give a good estimate of the
missing structure factor). It makes possible resolution extension during density modification or by
the "free lunch" procedures of Dodson and Sheldrick .
And I would argue that this coupling is what makes cross-validation
(free-R) work. We say
that refining against the working reflections improves the structure, making it more
like the true structure, and thus the free Fc approach their Fobs. But not because the
good fairy looks at the structure and says "OK, Its improved now, we can lower the
R-free".
How does it work mathematically? If the reflections were completely independent, if
free and working reflections were not coupled through being samples of the same molecular
transform, then changes which improve the fit to the working reflections would have no
effect on the values of the free reflections. It has to go through the structure,
changes due to refining against the working reflections affect the free reflections,
which we can call "coupling", and we know that is described by the G-function.
If free reflections were not coupled to working reflections, Rfree would never change and
thus would be useless.
For an example, suppose we refine the position of an atom, choosing working
reflections only in the plane l=0, and free reflections along the l axis (assuming an
orthorhombic system). The working reflections are only sensitive to position in the x and
y directions, so the z position would be unchanged by the refinement. But the free
reflections are only sensitive to position along the z axis, so R-free would be
unchanged. Presumably the structure would be improved (if that one atom was slightly
misplaced and all other atoms correctly placed), but the Rfee would not improve. I would
say this is the direction Chapman and co. were heading with their thin shells of free
reflections isolated by thick shells of unused guard reflections. If they really succeed
in eliminating the "bias", then Rfree will be unresponsive to refinement and so
useless.
Al. et Chapman considered two kinds of coupling- that due to ncs and
direct coupling via Rossmann's G function. They found that choosing free set
in thin shells had little effect, in fact very thick shells with the
test reflections centered in the middle of the shell were required to
significantly reduce the "bias". Now the reciprocal space equivalent of
ncs operators are pure rotational operators, so they relate points in
reciprocal space with precisely the same resolution. Selecting free
reflections in thin shells should thus be sufficient to ensure that
ncs-related reflections have the same free-R flag and avoid bias. For
my case where ncs is really crystallographic, the shells could be
infinitely thin since the symm-related reflections have precisely the
same resolution. For real ncs the operator takes a reflection to a
non-bragg position which is closely surrounded by reflections, coupled
to them by the G function.
In that case somewhat thicker shells would be required. But using very
thick guard zones around the free reflections implies it is the
G-function they are fighting, as they somewhat implicitly acknowledged by
the
discussion of thickness of shells in terms of the radius of the central
maximum
of the G function. In that case I wonder if ncs-coupling which still has
to go through G-function coupling to bias a free reflection
contributes significantly compared to the coupling of every reflection to
its direct neighbors.
By using thick guard zones of unused reflections, they end up refining with
very incomplete data which would be expected to affect the refinement and raise
the R-free just because the structure is less correct. They control for this by
refining with another set in which the same number of reflections are deleted
randomly. But this is not a satisfactory control, because it is generally
agreed that missing reflections due to an empty zone in reciprocal space is
more deleterious than missing reflections that are randomly scattered.
Ironically this same "redundancy due to oversampling" that Chapman and co.
discuss in their introduction allows neighboring reflections to impart most of the
information of an isolated absent reflection. When the missing reflections are clustered
together in a thick shell or wedge, a lot of information is not available and the
structure will suffer. And in particular the structural details that determine structure
factors in the center of the excluded zone will be poorly determined, since information
pertaining to them is being excluded. So of course the R-factor calculated from these
reflections will be higher than with randomly absent data. Furthermore, if G-function is
the vehicle by which R-free follows R, R-free will follow less closely and hence
under-report what improvement is being made.
>
> On Sun, 19 May 2019 at 04:34, Edward A. Berry <ber...@upstate.edu
<mailto:ber...@upstate.edu> <mailto:ber...@upstate.edu <mailto:ber...@upstate.edu>>>
wrote:
>
> Revisiting (and testing) an old question:
>
> On 08/12/2003 02:38 PM, wgsc...@chemistry.ucsc.edu
<mailto:wgsc...@chemistry.ucsc.edu> <mailto:wgsc...@chemistry.ucsc.edu
<mailto:wgsc...@chemistry.ucsc.edu>> wrote:
> > *** For details on how to be removed from this list visit the ***
> > *** CCP4 home page http://www.ccp4.ac.uk
<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ccp4.ac.uk_&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=gnCYb46FqFGI5qAlLQN6LOEov7vuNBpFzoR6kSjnA5Y&s=8a9HFH1BwjBbLxzg7EcUXBf0-isZOOGqa53sqlRR3EY&e=><https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ccp4.ac.uk&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=8QKUnHluH3BoqVGBCJIBrwzvKcMXJj0FA7ubqWWpqYo&e=>
***
>
> > On 08/12/2003 06:43 AM, Dirk Kostrewa wrote:
> >>
> >> (1) you only need to take special care for choosing a test set if
you _apply_
> >> the NCS in your refinement, either as restraints or as
constraints. If you
> >> refine your NCS protomers without any NCS restraints/constraints,
both your
> >> protomers and your reflections will be independent, and thus no
special care
> >> for choosing a test set has to be taken
> >
> > If your space group is P6 with only one molecule in the asymmetric unit but you
instead choose the subgroup P3 in which to refine it, and you now have two molecules per asymmetric unit
related by "local" symmetry to one another, but you don't apply it, does that mean that
reflections that are the same (by symmetry) in P6 are uncorrelated in P3 unless you apply the
"NCS"?
>
> ===================================================
> The experiment described below seems to show that Dirk's initial
> statement was correct: even in the case where the "ncs" is actually
> crystallographic, and the free set is chosen randomly, R-free is not
> affected by how you pick the free set. A structure is refined with
> artificially low symmetry, so that a 2-fold crystallographic operator
> becomes "NCS". Free reflections are picked either randomly (in which
> case the great majority of free reflections are related by the NCS to
> working reflections), or taking the lattice symmetry into account so
> that symm-related pairs are either both free or both working. The final
> R-factors are not significantly different, even with repeating each
mode
> 10 times with independently selected free sets. They are also not
> significantly different from the values obtained refining in the
correct
> space group, where there is no ncs.
>
> Maybe this is not really surprising. Since symmetry-related reflections
> have the same resolution, picking free reflections this way is one way
> of picking them in (very) thin shells, and this has been reported not
to
> avoid bias: See Table 2 of Kleywegt and Brunger Structure 1996, Vol 4,
> 897-904. Also results of Chapman et al.(Acta Cryst. D62, 227–238). And
see:
> http://www.phenix-online.org/pipermail/phenixbb/2012-January/018259.html
<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.phenix-2Donline.org_pipermail_phenixbb_2012-2DJanuary_018259.html&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=gnCYb46FqFGI5qAlLQN6LOEov7vuNBpFzoR6kSjnA5Y&s=-HVJbT7G2pECBs6z3G3jXq5GwwpAmpgam_rivJb3yts&e=><https://urldefense.proofpoint.com/v2/url?u=http-3A__www.phenix-2Donline.org_pipermail_phenixbb_2012-2DJanuary_018259.html&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=9oRDhpFat0zQ7aXSW2pTyPmPQdn9Bq0AZ0KorlSXsVI&e=>
>
> But this is more significant: in cases of lattice symmetry like this,
> the ncs takes working reflections directly onto free reflections. In
the
> case of true ncs the operator takes the reflection to a point between
> neighboring reflections, which are closely coupled to that point by the
> Rossmann G function. Some of these neighbors are outside the thin shell
> (if the original reflection was inside; or vice versa), and thus defeat
> the thin-shells strategy. In our case the symm-related free reflection
> is directly coupled to the working reflection by the ncs operator, and
> its neighbors are no closer than the neighbors of the original
> reflection, so if there is bias due to NCS it should be principally
> through the sym-related reflection and not through its neighbors. And
so
> most of the bias should be eliminated by picking the free set in thin
> shells or by lattice symmetry.
>
> Also, since the "ncs" is really crystallographic, we have the control
of
> refining in the correct space group where there is no ncs. The
R-factors
> were not significantly different when the structure was refined in the
> correct space group. (Although it could be argued that that leads to a
> better structure, and the only reason the R-factors were the same is
> that bias in the lower symmetry refinement resulted in lowering Rfree
> to the same level.)
>
> Just one example, but it is the first I tried- no cherry-picking. I
> would be interested to know if anyone has an example where taking
> lattice symmetry into account did make a difference.
>
> For me the lack of effect is most simply explained by saying that,
while
> of course ncs-related reflections are correlated in their Fo's and
Fc's,
> and perhaps in in their |Fo-Fc|'s, I see no reason to expect that the
> _changes_ in |Fo-Fc| produced by a step of refinement will be
correlated
> (I can expound on this). Therefore whatever refinement is doing to
> improve the fit to working reflections is equally likely to improve or
> worsen the fit to sym-related free reflections. In that case it is hard
> to see how refinement against working reflections could bias their
> symm-related free reflections. (Then how does R-free work? Why does
> R-free come down at all when you refine? Because of coupling to
> neighboring working reflections by the G-function?)
>
> Summary of results (details below):
> 0. structure 2CHR, I422, as reported in PDB, with 2-Sigma cutoff)
> R: 0.189 Rfree: 0.264 Nfree:442(5%) Nrefl: 9087
>
> 1. The deposited 2chr (I422) was refined in that space group with the
> original free set. No Sigma cutoff, 10 macrocycles.
> R: 0.1767 Rfree: 0.2403 Nfree:442(5%) Nrefl: 9087
>
> 2. The deposited structure was refined in I422 10 times, 50 macrocycles
> each, with randomly picked 10% free reflections
> R: 0.1725±0.0013 Rfree: 0.2507±0.0062 Nfree: 908.9± Nrefl: 9087
>
> 3. The structure was expanded to an I4 dimer related by the unused I422
> crystallographic operator, matching the dimer of 1chr. This dimer was
> refined against the original (I4) data of 1chr, picking free
reflections
> in symmetry related pairs. This was repeated 10 times with different
> random seed for picking reflections.
> R: 0.1666±0.0012 **Rfree:0.2523±0.0077 Nfree: 1601.4 Nrefl:16011
>
> 4. same as 3 but picking free reflections randomly without regard for
> lattice symmetry.
> On average 15 free reflections were in pairs, 212 were invariant under
> the operator (no sym-mate) and 1374 (86%) were paired with working
> reflections.
> R: 0.1674±0.0017 **Rfree:0.2523±0.0050 Nfree: 1600.9 Nrefl:16011
>
> (**-Average Rfree almost identical by coincidence- the individual
> results were all different)
>
> Detailed results from the individual refinement runs are available in
> spreadsheet in dropbox:
> https://www.dropbox.com/s/fwk6q90xbc5r8n1/NCSbias.xls?dl=0
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.dropbox.com_s_fwk6q90xbc5r8n1_NCSbias.xls-3Fdl-3D0&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=gnCYb46FqFGI5qAlLQN6LOEov7vuNBpFzoR6kSjnA5Y&s=ECmOQpcQpH7mncbvn_A1uTKIs3k_iV9n0jIAKXNYMEQ&e=><https://urldefense.proofpoint.com/v2/url?u=https-3A__www.dropbox.com_s_fwk6q90xbc5r8n1_NCSbias.xls-3Fdl-3D0&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=xjmRlh84Tgcz_o3E3OzRlzo5uEaF92jfvm39eskwksQ&e=>
> Scripts used in running the tests are also there in NCSbias.tgz:
> https://www.dropbox.com/s/sul7a6hzd5krppw/NCSbias.tgz?dl=0
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.dropbox.com_s_sul7a6hzd5krppw_NCSbias.tgz-3Fdl-3D0&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=gnCYb46FqFGI5qAlLQN6LOEov7vuNBpFzoR6kSjnA5Y&s=7Fjus1vJzmez6pdFctqgUnwdktmS9OE5sIuWekvdbnQ&e=><https://urldefense.proofpoint.com/v2/url?u=https-3A__www.dropbox.com_s_sul7a6hzd5krppw_NCSbias.tgz-3Fdl-3D0&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=rTs7C-Kah1oWzzdHbYI8K4zB9p1hkaLWhKoXB8YwGHU&e=>
>
> ========================================
>
> Methods:
> I would like an experiment where relatively complete data is available
> in the lower symmetry. To get something that is available to everyone,
I
> choose from the PDB. A good example is 2CHR, in space group I422, which
> was originally solved and the data deposited in I4 with two molecules
in
> the asymmetric unit(structure 1CHR).
>
> 2CHR statistics from the PDB:
> R R-free complete (Refined 8.0 to 3.0 A
> 0.189 0.264 81.4 reported in PDB, with 2-Sig cutoff)
> Nfree=442 (4.86%)
> Further refinement in phenix with same free set, no sigma cutoff:
> 10 macrocycles bss, indiv XYZ, indiv ADP refinement; phenix default
> Resol 37.12 - 3.00 A 92.95% complete, Nrefl=9087 Nfree=442(4.86%)
> Start: r_work = 0.2097 r_free = 0.2503 bonds = 0.008 angles = 1.428
> Final: r_work = 0.1787 r_free = 0.2403 bonds = 0.011 angles = 1.284
> (2chr_orig_001.pdb,
>
> The number of free reflections is small, so the uncertainty
> in Rfree is large (a good case for Rcomplete)
> Instead for better statistics, use new 10% free set and repeat 10
times;
> 50 macrocycles, with different random seeds:
> R: 0.1725±0.0013 Rfree: 0.2507±0.0062 bonds:0.010 Angles:1.192
> Nfree: 908.9±0.32 Nrefl: 9087
>
> For artificially low symmetry, expand the I422 structure (making what I
> call 3chr for convenience although I'm sure that ID has been taken):
>
> pdbset xyzin 2CHR.pdb xyzout 3chr.pdb <<eof
> exclude header
> spacegroup I4
> cell 111.890 111.890 148.490 90.00 90.00 90.00
> symgen X,Y,Z
> symgen X,1-Y,1-Z
> CHAIN SYMMETRY 2 A B
> eof
>
> Get the structure factors from 1CHR: 1chr-sf.cif
> Run phenix.refine on 3chr.pdb with 1chr-sf.cif.
> This file has no free set (deposited 1993) so tell phenix to generate
> one. I don't want phenix to protect me from my own stupidity, so I use:
> generate = True
> use_lattice_symmetry = False
> use_dataman_shells = False
> (the .eff file with all non-default parameters is available as
> 3chr_rand_001.eff in the .tgz mentioned above)
>
> For more significance, use the script multirefine.csh to repeat the
refinement 10 times with different random seed.After each run, grep significant
results into a log file.
>
>
> To check this gives free reflections related to working reflections, I
> used mtz2various and a fortran prog (sortfree.f in .tgz) to separate
the
> data (3chr_rand_data.mtz) into two asymmetric units: h,k,l with h>k
> (columns 4-5) and with h<k (col 6-7), listed the pairs, thusly:
>
> mtz2various hklin 3chr_rand_data.mtz hklout temp.hkl <<eof
> LABIN FP=F-obs DUM1=R-free-flags
> OUTPUT USER '(3I4,2F10.5)'
> eof
> sortfree <<eof >sort3.hkl
>
> sort3.hkl looks like:
> ______h>k______ ______h<k______
> h k l F free F* free*
> 1 2 3 208.97 0.00 174.95 0.00
> 1 2 5 226.85 0.00 191.65 0.00
> 1 2 7 144.85 0.00 164.86 0.00
> 1 2 9 251.26 0.00 261.71 0.00
> 1 2 11 333.84 0.00 335.18 0.00
> 1 2 13 800.37 0.00 791.77 0.00
> 1 2 15 412.92 0.00 409.90 0.00
> 1 2 17 306.99 0.00 317.53 0.00
> 1 2 19 225.54 0.00 220.91 0.00
> 1 2 21 101.20 1.00* 104.84 0.00
> 1 2 23 156.27 0.00 156.49 0.00
> 1 2 25 202.97 0.00 202.23 0.00
> 1 2 27 216.10 0.00 219.28 0.00
> 1 2 29 106.76 0.00 100.93 0.00
> 1 2 31 157.32 0.00 154.37 1.00*
> 1 2 33 71.84 0.00 20.78 0.00
> 1 2 35 179.05 0.00 165.67 0.00
> 1 2 37 254.04 0.00 239.96 1.00*
> 1 2 39 69.56 0.00 30.61 0.00
> 1 2 41 56.20 0.00 51.02 0.00
>
> , and awked for 1 in the free columns. Out of 6922 pairs of
reflections,
> in one case:
> 674 in the first asu (h>k) are in the free set,
> 703 in the second asu (h<k) are in the free set
> only 11 pairs have the reflections in both asu free.
>
> out of 16011 refl in I4,
> 6922 pairs (=13844 refl), 1049 invariant (h=k or h=0), 1118 with
absent mate.
>
> out of 1601 free reflections:
> On average 15 free reflections were in pairs, 212 were invariant under
> the operator (no sym-mate) and 1374 (86%) were paired with working
> reflections.
>
> Then do 10 more runs of 50 macrocycles with:
> use_lattice_symmetry = False
> collecting the same statistics
> (also scripted in multirefine.csh)
>
> Finally, use ref2chr.eff to refine (as previously mentined) a monomer
in I422 (2chr.pdb) 10 times with 10% free, 50 macrocycles
> (also scripted in multirefine.csh)
>
>
########################################################################
>
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_webadmin-3FSUBED1-3DCCP4BB-26A-3D1&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=gnCYb46FqFGI5qAlLQN6LOEov7vuNBpFzoR6kSjnA5Y&s=ru5FRcpVRQEMf0ef99fol07U7H-P_5ScFlevkqrny-U&e=><https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_webadmin-3FSUBED1-3DCCP4BB-26A-3D1&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=wkNovlvAi1Ya9VZcTQk8mRnytM2fWnisElnTux6p5Kk&e=>
>
>
>

>
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_webadmin-3FSUBED1-3DCCP4BB-26A-3D1&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=gnCYb46FqFGI5qAlLQN6LOEov7vuNBpFzoR6kSjnA5Y&s=ru5FRcpVRQEMf0ef99fol07U7H-P_5ScFlevkqrny-U&e=><https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_webadmin-3FSUBED1-3DCCP4BB-26A-3D1&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=uwuIv6NVV7k7QShQJJLcd9XuIrcFh0UeMnnQ59IfsQE&s=wkNovlvAi1Ya9VZcTQk8mRnytM2fWnisElnTux6p5Kk&e=>
>
########################################################################
To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_webadmin-3FSUBED1-3DCCP4BB-26A-3D1&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=gnCYb46FqFGI5qAlLQN6LOEov7vuNBpFzoR6kSjnA5Y&s=ru5FRcpVRQEMf0ef99fol07U7H-P_5ScFlevkqrny-U&e=>

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_webadmin-3FSUBED1-3DCCP4BB-26A-3D1&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=gnCYb46FqFGI5qAlLQN6LOEov7vuNBpFzoR6kSjnA5Y&s=ru5FRcpVRQEMf0ef99fol07U7H-P_5ScFlevkqrny-U&e=>

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=CCP4BB&A=1
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.jiscmail.ac.uk_cgi-2Dbin_webadmin-3FSUBED1-3DCCP4BB-26A-3D1&d=DwMFaQ&c=ogn2iPkgF7TkVSicOVBfKg&r=cFgyH4s-peZ6Pfyh0zB379rxK2XG5oHu7VblrALfYPA&m=gnCYb46FqFGI5qAlLQN6LOEov7vuNBpFzoR6kSjnA5Y&s=ru5FRcpVRQEMf0ef99fol07U7H-P_5ScFlevkqrny-U&e=>