Re: [ccp4bb] How to divide an equal amount of reflections into 20 bins? (like Staraniso does it)

Clemens Vonrhein Mon, 24 Nov 2025 10:58:59 -0800

Dear all,

Just a few technical points and clarifications ... to avoid
unneceessary confusion in the future (CCP4bb postings can have a long
shelf life) :-)

On Tue, Nov 18, 2025 at 06:55:15PM +0000, Martin Moche wrote:
> As we all know STARANISO selects the best ellipse, instead of
> sphere, when integrating reflections.

STARANISO does not do any integration: that is part of the actual
integration program reading the raw diffraction images and turning
them into a list of unscaled+unmerged intensities (and sigmas).

The next step then is to determine appropriate scale/correction
factors to turn those into a set of scaled+unmerged
intensities. Those could be fed into STARANISO, but more typically it
uses the scaled+merged (using inverse-variance weighting) intensities
to determine a cutoff surface.

That cutoff can have any shape that is constrained to the point-group
symmetry (plus a centre of symmetry): the term "anisotropic" does not
imply that it has to be an ellipsoid - just that it is not constrained
to be isotropic (as in a spherical cutoff surface based on per-bin
statistics and a single high-resolution value) [1].

STARANISO fits an ellipsoid to that generic anisotropic cutoff surface
for two main reasons: (1) providing three diffraction limits along the
axes of that ellipsoid to have a reasonably simple description of any
potential anisotropic diffraction, and (2) computing statistics -
especially completeness - to accurately describe the outcome of the
diffraction experiment.

> In my experience this is good because
>
>
> 1. Crystals generally diffract better in some directions than
>    others
> 2. The electron density maps look better after STARANISO elliptical
>    data truncation, as compared to spherical data truncation in XDS,
>    XDSAPP3, DIALS etc. so it becomes easier and faster to build a
>    model and refine a structure.

One caveat depending on the refinement program/procedure used: the DFc
completion used by most (all?) refinement programs will provide purely
model-based map coefficients to the electron density map for those
reflections missing in the initial list of reflections in the input
MTZ file. That is a sensible approach when those reflections are
missing because they weren't measured although we assume they should be
observable (think: reflections behind beamstop, in detector module
gaps or lost due to cusp).

It becomes problematic when there is no check against "observability"
at the map computation stage and one just uses all reflections with
Miller indices in the MTZ file to assign DFc values to the 2mFo-DFc
electron density maps when there is no Fo value - e.g. for highly
anisotropic data (where a large number of reflections are unobservable
in some directions at higher resolution and those are therefore not
present in the list of Fo).

This is why BUSTER will output three different sets of
electron-density map coefficients:

  2FOFCWT            / PH2FOCWT              <<< for all Fo
  2FOFCWT_aniso-fill / PH2FOCWT_aniso-fill   <<< for all HKL within the cutoff 
surface, i.e. defined as "observable"
  2FOFCWT_iso-fill   / PH2FOCWT_iso-fill     <<< to the highest d* of any 
Miller index in the input file

(Other refinement packages do something similar and provide several
map coefficients). As you can see, using the last set (map
coefficients in a sphere) can create artificially "nice" maps due to
model bias: if you see near perfect density over the full model you
might want to check if the correct electron density map is used
(2FOFCWT_aniso-fill / PH2FOCWT_aniso-fill is what we would recommend).

If you used the STARANISO output file as input into refinement and on
output you only get that last set of map coefficients (i.e. all
reflections in a sphere to the high resolution limit), you could run
something like the following set of commands to fix this:

  cad hklin1 staraniso_alldata-unique.mtz \
      hklin2 refinement_out.mtz \
      hklout tmp.mtz \
      <<e
  LABI FILE 1 E1=SA_flag
  LABI FILE 2 E1=F E2=SIGF E3=FWT E4=PHWT
  e

  sftools <<e
  read tmp.mtz
  select col SA_flag > 0
  calc F col 2FOFCWT_aniso-fill = col FWT
  calc P col PH2FOFCWT_aniso-fill = col PHWT
  select all
  calc F col 2FOFCWT_iso-fill = col FWT
  calc P col PH2FOFCWT_iso-fill = col PHWT
  delete col FWT
  delete col PHWT
  select col F = present
  calc F col 2FOFCWT = col 2FOFCWT_iso-fill
  calc P col PH2FOFCWT = col PH2FOFCWT_iso-fill
  select all
  write e-density.mtz
  stop
  e

Now you'll have all three sets of electron-density map coefficients at
your disposal.

The above steps might be necessary especially when you run refinement
through an interface that tries to be helpful and "completes" the
incoming MTZ file with test-set flags right to largest d* value of any
observed reflection [2]. Those dummy Miller indices (denoting a
reciprocal lattice point with no associated data) will receive purely
model-based DFc values for electron density map computations.

> To my understanding AIMLESS assumes 100 % completeness when creating
> its 20 bins, and elliptically truncated data

STARANISO doesn't truncate the data ellipsoidally: it uses the
anisotropic cutoff surface as described above (the ellipsoid is just a
construct to make reporting easier).

> has extremely low completeness in the highest resolution shells,

True - if we assume that all reciprocal lattice points (Miller
indices) within the sphere out to the high-resolution value of the
outer shell are potentially observable. But that isn't quite what
"completeness" tries to measure: we want to know how many reflections
we measured relative to the number of reflections we could've
measured. This is something quite different.

If we had a crystal with perfect isotropic diffraction we are happy to
define some outer limit (i.e. a sphere described by a radius) where we
say: "We could have measured all reflections within that sphere and
everything outside of that sphere is just not observable (e.g. too
weak)". That makes complete sense: we don't want to pretend that every
crystal diffracts to 0.5A resolution and therefore we should compute
overall completeness always to a high-resolution limit of 0.5A
... that would be silly.

The same applies to a crystal with anisotropic diffraction: we need to
define a region in reciprocal space where we can say the same thing
("inside is observable and outside is not"). STARANISO provides us
that shape as an ellipsoid defined by three axis lengths (diffraction
limits). So now we compute the completeness relative to the
reflections within that shape - not much different to the concept of a
sphere, but more accurate in this case.

> so Table 1 is looking very odd with extremely low completeness in
> the highest resolution shell when using REFMAC5 after STARANISO.

That is why STARANISO (or autoPROC) report the "Completeness
(ellipsoidal)" in their Table1 and the deposition-ready mmCIF file [3]
- which is what we think should be reported in papers etc.

Hope that helps to clarify some of the finer details.

Cheers

Clemens & Ian

[1] https://staraniso.globalphasing.org/anisotropy_about.html
[2] https://staraniso.globalphasing.org/test_set_flags_about.html
[3] see eg. https://staraniso.globalphasing.org/table1/q6/8q68.html

########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

Re: [ccp4bb] How to divide an equal amount of reflections into 20 bins? (like Staraniso does it)

Reply via email to