On Thu, Mar 31, 2011 at 10:43 PM, James Holton <jmhol...@lbl.gov> wrote: > I have the 2002 edition, and indeed it only contains space group > numbers up to 230. The page numbers quoted by Ian contain space group > numbers 17 and 18.
You need to distinguish the 'IT space group number' which indeed goes up to 230 (i.e. the number of unique settings), from the 'CCP4 space group number' which, peculiar to CCP4 (which is why I called it 'CCP4-ese'), adds a multiple of 1000 to get a unique number for the alternate settings as used in the API. The page I mentioned show the diagrams for IT SG #18 P22121 (CCP4 #3018), P21221 (CCP4 #2018) and P21212 (CCP4 #18), so they certainly are all there! > Although I am all for program authors building in support for the > "screwy orthorhombics" (as I call them), I should admit that my > fuddy-duddy strategy for dealing with them remains simply to use space > groups 17 and 18, and permute the cell edges around with REINDEX to > put the unique (screw or non-screw) axis on the "c" position. Re-indexing is not an option for us (indeed if there were no alternative, it would be a major undertaking), because the integrity of our LIMS database requires that all protein-ligand structures from the same target & crystal form are indexed with the same (or nearly the same) cell and space group (and it makes life so much easier!). With space-groups such as P22121 it can happen (indeed it has happened) that it was not possible to define the space group correctly at the processing stage due to ambiguous absences; indeed it was only after using the "SGALternative ALL" option in Phaser and refining each TF solution that we identified the space group correctly as P22121. Having learnt the lesson the hard way, we routinely use P222 for all processing of orthorhombics, which of course always gives the conventional a<b<c setting, and only assign the space group well down the pipeline and only when we are 100% confident; by that time it's too late to re-index (indeed why on earth would we want to give ourselves all that trouble?). This is therefore totally analogous to the scenario of yesteryear that I described where it was common to see a 'unit cell' communication followed some years later by the structure paper (though we have compressed the gap somewhat!), and we base the setting on the unit cell convention for exactly the same reason. It's only if you're doing 1 structure at a time that you can afford the luxury of re-indexing - and also the pain: many times I've seen even experienced people getting their files mixed up and trying to refine with differently indexed MTZ & PDB files (why is my R factor so high?)! My advice would be - _never_ re-index! -- Ian > I have > yet to encounter a program that gets broken when presented with data > that doesn't have a<b<c, but there are many non-CCP4 programs out > there that still don't seem to understand P22121, P21221, P2122 and > P2212. I find that surprising! Exactly which 'many' programs are those? You really should report them to CCP4 (or to me if it's one of mine) so they can be fixed! We've been using CCP4 programs as integral components of our processing pipeline (from data processing through to validation) for the last 10 years and I've never come across one that's broken in the way you describe (I've found many broken for other reasons and either fixed it myself or reported it - you should do the same!). Any program which uses csymlib with syminfo.lib can automatically handle all space groups defined in syminfo, which includes all the common alternates you mentioned (and others such as I2). The only program I'm aware of that's limited to the standard settings is sftools (because it has its own internal space group table - it would be nice to see it updated to use syminfo!). > This is not the only space group convention "issue" out there! The > R3x vs H3x business continues to be annoying to this day! Yeah to that! H centring was defined in IT long ago (look it up) and it has nothing to do with the R setting! -- Ian