On Thu, Mar 31, 2011 at 10:43 PM, James Holton <jmhol...@lbl.gov> wrote:
> I have the 2002 edition, and indeed it only contains space group
> numbers up to 230.  The page numbers quoted by Ian contain space group
> numbers 17 and 18.

You need to distinguish the 'IT space group number' which indeed goes
up to 230 (i.e. the number of unique settings), from the 'CCP4 space
group number' which, peculiar to CCP4 (which is why I called it
'CCP4-ese'), adds a multiple of 1000 to get a unique number for the
alternate settings as used in the API.  The page I mentioned show the
diagrams for IT SG #18 P22121 (CCP4 #3018), P21221 (CCP4 #2018) and
P21212 (CCP4 #18), so they certainly are all there!

> Although I am all for program authors building in support for the
> "screwy orthorhombics" (as I call them), I should admit that my
> fuddy-duddy strategy for dealing with them remains simply to use space
> groups 17 and 18, and permute the cell edges around with REINDEX to
> put the unique (screw or non-screw) axis on the "c" position.

Re-indexing is not an option for us (indeed if there were no
alternative, it would be a major undertaking), because the integrity
of our LIMS database requires that all protein-ligand structures from
the same target & crystal form are indexed with the same (or nearly
the same) cell and space group (and it makes life so much easier!).
With space-groups such as P22121 it can happen (indeed it has
happened) that it was not possible to define the space group correctly
at the processing stage due to ambiguous absences; indeed it was only
after using the "SGALternative ALL" option in Phaser and refining each
TF solution that we identified the space group correctly as P22121.

Having learnt the lesson the hard way, we routinely use P222 for all
processing of orthorhombics, which of course always gives the
conventional a<b<c setting, and only assign the space group well down
the pipeline and only when we are 100% confident; by that time it's
too late to re-index (indeed why on earth would we want to give
ourselves all that trouble?).  This is therefore totally analogous to
the scenario of yesteryear that I described where it was common to see
a 'unit cell' communication followed some years later by the structure
paper (though we have compressed the gap somewhat!), and we base the
setting on the unit cell convention for exactly the same reason.

It's only if you're doing 1 structure at a time that you can afford
the luxury of re-indexing - and also the pain: many times I've seen
even experienced people getting their files mixed up and trying to
refine with differently indexed MTZ & PDB files (why is my R factor so
high?)!  My advice would be - _never_ re-index!

-- Ian


>  I have
> yet to encounter a program that gets broken when presented with data
> that doesn't have a<b<c, but there are many non-CCP4 programs out
> there that still don't seem to understand P22121, P21221, P2122 and
> P2212.

I find that surprising!  Exactly which 'many' programs are those?  You
really should report them to CCP4 (or to me if it's one of mine) so
they can be fixed!  We've been using CCP4 programs as integral
components of our processing pipeline (from data processing through to
validation) for the last 10 years and I've never come across one
that's broken in the way you describe (I've found many broken for
other reasons and either fixed it myself or reported it - you should
do the same!).  Any program which uses csymlib with syminfo.lib can
automatically handle all space groups defined in syminfo, which
includes all the common alternates you mentioned (and others such as
I2).  The only program I'm aware of that's limited to the standard
settings is sftools (because it has its own internal space group table
- it would be nice to see it updated to use syminfo!).

> This is not the only space group convention "issue" out there!  The
> R3x vs H3x business continues to be annoying to this day!

Yeah to that!  H centring was defined in IT long ago (look it up) and
it has nothing to do with the R setting!

-- Ian

Reply via email to