Dear Ian, Those are good points, but it’s worth noting that in the plots that Phaser produces for the observed vs theoretical second moments, the effects of the intensity standard deviations are taken into account in the theoretical curve. Seeing that the curves match serves as evidence both that the crystal is untwinned and that the standard deviations have been estimated appropriately.
Best wishes, Randy ----- Randy J. Read Department of Haematology, University of Cambridge Cambridge Institute for Medical Research Tel: +44 1223 336500 The Keith Peters Building Hills Road E-mail: [email protected] Cambridge CB2 0XY, U.K. www-structmed.cimr.cam.ac.uk From: CCP4 bulletin board <[email protected]> on behalf of Ian Tickle <[email protected]> Date: Wednesday, 3 December 2025 at 17:05 To: [email protected] <[email protected]> Subject: Re: [ccp4bb] [proc-develop] Fwd: [ccp4bb] How to divide an equal amount of reflections into 20 bins? (like Staraniso does it) Dear All I thought it worth adding for completeness a further point concerning anisotropic cutoffs to Clemens's excellent explanation. I have come across claims that an anisotropic cutoff distorts the intensity statistics. I will argue and demonstrate that actually the exact opposite is true: NOT using an anisotropic cutoff can bias the moments away from their expected values based on error-free data. For the error-free acentric Wilson distribution it's easy to show that the 2nd raw moment E(Z^2) of the normalised intensity Z = I / E(I) is 2. The r'th raw moment of a random variable x is the statistical expectation (mean value) of the r'th power of x (x^r: r need not be an integer). A "raw moment" is one where the deviations are relative to zero as opposed to a "central moment" where deviations are relative to the mean (the mean is the 1st raw moment, the variance is the 2nd central moment, the skewness is the 3rd central moment etc.). Moments have an important practical application in the detection of twinning. In reality there will always be experimental errors and if we assume the usual approximation of Gaussian random errors in the intensities, one can show that the 2nd raw moment of the normalised intensity for the acentric Wilson distribution is: E(Z^2) = 2 + <sigma(I)^2> / S^2 where S is the Wilson distribution parameter. This means that for weak data where sigma(I) < S, the errors have the effect of inflating the moments (one can show by generalising the above expression that the effect is even greater for higher order moments r > 2). Since the effect of twinning is to reduce all the moments with r > 1 (i.e. the twinned intensity distribution has fewer extreme values compared with the untwinned case), the effect of random errors will be to mask the effects of twinning in the tests. For my test data (courtesy of Eleanor), STARANISO gives anisotropic diffraction limits (1.96, 1.62, 1.40) so that for reflections at lower resolution than the lowest d* (1/1.96) the data are almost complete and the effect of weak data on the moments is negligible. Between the lower and upper limits (d* = 1/1.96 to 1/1.40) the proportion of weak data inside a sphere will progressively increase causing the moments to increase according to the equation above. Beyond the bounding sphere of radius d* = 1/1.4 where all the data are weak, the moments tend to infinity as S approaches zero, so one definitely does not want to include those, but by the same argument one does not want to include weak data between the anisotropic cutoff surface and the bounding sphere either. This is where the anisotropic cutoff comes in. The usual isotropic completeness will progressively decrease between these limits, whereas the more relevant anisotropic completeness (defined as the proportion of possible observed reflections that are actually observed) will remain high, and the moments will remain at their expected values for error-free data. It is not possible for weak data to be observed (by definition) so they should not be counted as 'possibles'. The attached plot demonstrates this for the test data (acentric reflections only). This was run first with CTRUNCATE which applies an anisotropy correction but not an anisotropic cutoff. CTRUNCATE outputs the same reflections used for the plot to its output MTZ file. Then the same data was run with STARANISO and the output of that put through CTRUNCATE just to get comparable plots (in that case CTRUNCATE would not apply an anisotropy correction because the data have already been anisotropy-corrected by STARANISO). STARANISO only outputs the reflections used in the plot. One can see exactly the effects described above in the plots. There are progressively fewer reflections in the range d* = 1/1.96 to 1/1.4 in the STARANISO plot because they have been cut by the anisotropic cutoff leaving only the "good stuff". Note that CTRUNCATE mislabels the plots as moments of I (<I^2>) whereas they are plainly moments of Z (<Z^2> = <I^2> / <I>^2). One final point: the equation shows that tests should always be performed on expectations NOT on measured values: this is very important. STARANISO goes to considerable lengths to compute expectations, typically using local averages or global solid harmonics function fitting. So a reflection is classed as "weak" only when the local average or interpolated mean I/sigma(I) of neighbouring reflections is low (usually < 1.2), not necessarily when the individual measured I/sigma(I) is low. Some legacy programs apply cutoffs using the measured I/sigma(I) or F/sigma(F) as a criterion: never a good idea! Cheers -- Ian (Global Phasing, Cambridge, UK). > ---------- Forwarded message --------- > From: Clemens Vonrhein > <[email protected]<mailto:[email protected]>> > Date: Mon, 24 Nov 2025 at 18:58 > Subject: Re: [ccp4bb] How to divide an equal amount of reflections into 20 > bins? (like Staraniso does it) > To: <[email protected]<mailto:[email protected]>> > > > Dear all, > > Just a few technical points and clarifications ... to avoid > unneceessary confusion in the future (CCP4bb postings can have a long > shelf life) :-) > > On Tue, Nov 18, 2025 at 06:55:15PM +0000, Martin Moche wrote: > > As we all know STARANISO selects the best ellipse, instead of > > sphere, when integrating reflections. > > STARANISO does not do any integration: that is part of the actual > integration program reading the raw diffraction images and turning > them into a list of unscaled+unmerged intensities (and sigmas). > > The next step then is to determine appropriate scale/correction > factors to turn those into a set of scaled+unmerged > intensities. Those could be fed into STARANISO, but more typically it > uses the scaled+merged (using inverse-variance weighting) intensities > to determine a cutoff surface. > > That cutoff can have any shape that is constrained to the point-group > symmetry (plus a centre of symmetry): the term "anisotropic" does not > imply that it has to be an ellipsoid - just that it is not constrained > to be isotropic (as in a spherical cutoff surface based on per-bin > statistics and a single high-resolution value) [1]. > > STARANISO fits an ellipsoid to that generic anisotropic cutoff surface > for two main reasons: (1) providing three diffraction limits along the > axes of that ellipsoid to have a reasonably simple description of any > potential anisotropic diffraction, and (2) computing statistics - > especially completeness - to accurately describe the outcome of the > diffraction experiment. > > > In my experience this is good because > > > > > > 1. Crystals generally diffract better in some directions than > > others > > 2. The electron density maps look better after STARANISO elliptical > > data truncation, as compared to spherical data truncation in XDS, > > XDSAPP3, DIALS etc. so it becomes easier and faster to build a > > model and refine a structure. > > One caveat depending on the refinement program/procedure used: the DFc > completion used by most (all?) refinement programs will provide purely > model-based map coefficients to the electron density map for those > reflections missing in the initial list of reflections in the input > MTZ file. That is a sensible approach when those reflections are > missing because they weren't measured although we assume they should be > observable (think: reflections behind beamstop, in detector module > gaps or lost due to cusp). > > It becomes problematic when there is no check against "observability" > at the map computation stage and one just uses all reflections with > Miller indices in the MTZ file to assign DFc values to the 2mFo-DFc > electron density maps when there is no Fo value - e.g. for highly > anisotropic data (where a large number of reflections are unobservable > in some directions at higher resolution and those are therefore not > present in the list of Fo). > > This is why BUSTER will output three different sets of > electron-density map coefficients: > > 2FOFCWT / PH2FOCWT <<< for all Fo > 2FOFCWT_aniso-fill / PH2FOCWT_aniso-fill <<< for all HKL within the > cutoff surface, i.e. defined as "observable" > 2FOFCWT_iso-fill / PH2FOCWT_iso-fill <<< to the highest d* of any > Miller index in the input file > > (Other refinement packages do something similar and provide several > map coefficients). As you can see, using the last set (map > coefficients in a sphere) can create artificially "nice" maps due to > model bias: if you see near perfect density over the full model you > might want to check if the correct electron density map is used > (2FOFCWT_aniso-fill / PH2FOCWT_aniso-fill is what we would recommend). > > If you used the STARANISO output file as input into refinement and on > output you only get that last set of map coefficients (i.e. all > reflections in a sphere to the high resolution limit), you could run > something like the following set of commands to fix this: > > cad hklin1 staraniso_alldata-unique.mtz \ > hklin2 refinement_out.mtz \ > hklout tmp.mtz \ > <<e > LABI FILE 1 E1=SA_flag > LABI FILE 2 E1=F E2=SIGF E3=FWT E4=PHWT > e > > sftools <<e > read tmp.mtz > select col SA_flag > 0 > calc F col 2FOFCWT_aniso-fill = col FWT > calc P col PH2FOFCWT_aniso-fill = col PHWT > select all > calc F col 2FOFCWT_iso-fill = col FWT > calc P col PH2FOFCWT_iso-fill = col PHWT > delete col FWT > delete col PHWT > select col F = present > calc F col 2FOFCWT = col 2FOFCWT_iso-fill > calc P col PH2FOFCWT = col PH2FOFCWT_iso-fill > select all > write e-density.mtz > stop > e > > Now you'll have all three sets of electron-density map coefficients at > your disposal. > > The above steps might be necessary especially when you run refinement > through an interface that tries to be helpful and "completes" the > incoming MTZ file with test-set flags right to largest d* value of any > observed reflection [2]. Those dummy Miller indices (denoting a > reciprocal lattice point with no associated data) will receive purely > model-based DFc values for electron density map computations. > > > To my understanding AIMLESS assumes 100 % completeness when creating > > its 20 bins, and elliptically truncated data > > STARANISO doesn't truncate the data ellipsoidally: it uses the > anisotropic cutoff surface as described above (the ellipsoid is just a > construct to make reporting easier). > > > has extremely low completeness in the highest resolution shells, > > True - if we assume that all reciprocal lattice points (Miller > indices) within the sphere out to the high-resolution value of the > outer shell are potentially observable. But that isn't quite what > "completeness" tries to measure: we want to know how many reflections > we measured relative to the number of reflections we could've > measured. This is something quite different. > > If we had a crystal with perfect isotropic diffraction we are happy to > define some outer limit (i.e. a sphere described by a radius) where we > say: "We could have measured all reflections within that sphere and > everything outside of that sphere is just not observable (e.g. too > weak)". That makes complete sense: we don't want to pretend that every > crystal diffracts to 0.5A resolution and therefore we should compute > overall completeness always to a high-resolution limit of 0.5A > ... that would be silly. > > The same applies to a crystal with anisotropic diffraction: we need to > define a region in reciprocal space where we can say the same thing > ("inside is observable and outside is not"). STARANISO provides us > that shape as an ellipsoid defined by three axis lengths (diffraction > limits). So now we compute the completeness relative to the > reflections within that shape - not much different to the concept of a > sphere, but more accurate in this case. > > > so Table 1 is looking very odd with extremely low completeness in > > the highest resolution shell when using REFMAC5 after STARANISO. > > That is why STARANISO (or autoPROC) report the "Completeness > (ellipsoidal)" in their Table1 and the deposition-ready mmCIF file [3] > - which is what we think should be reported in papers etc. > > Hope that helps to clarify some of the finer details. > > Cheers > > Clemens & Ian > > [1] https://staraniso.globalphasing.org/anisotropy_about.html > [2] https://staraniso.globalphasing.org/test_set_flags_about.html > [3] see eg. https://staraniso.globalphasing.org/table1/q6/8q68.html > > ######################################################################## > > To unsubscribe from the CCP4BB list, click the following link: > https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 > > This message was issued to members of > www.jiscmail.ac.uk/CCP4BB<http://www.jiscmail.ac.uk/CCP4BB>, a mailing > list hosted by www.jiscmail.ac.uk<http://www.jiscmail.ac.uk/>, terms & > conditions are available at > https://www.jiscmail.ac.uk/policyandsecurity/ > _______________________________________________ > proc-develop mailing list > [email protected]<mailto:[email protected]> > http://wwi.globalphasing.com/mailman/listinfo/proc-develop -- *-------------------------------------------------------------- * Clemens Vonrhein, Ph.D. vonrhein AT GlobalPhasing DOT com * Global Phasing Ltd., 9 Journey Campus, Castle Park * Cambridge CB3 0AX, UK www.globalphasing.com<http://www.globalphasing.com/> *-------------------------------------------------------------- ________________________________ To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 ######################################################################## To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/
