Dear Ralf, I like your hybrid_ 36 scheme and will implement it and the two character chain IDs PDB file columns 21 and 22, right justified) when I next update SHELXL. Of course I will need to do some programming because of the SHELX 'zero dependency' philosophy, but it seems to me to be straightforward. In fact SHELXL treats the residue numbers internally as strings anyway. Since I am not in the habit of releasing new versions in a hurry, that will give time to make the necessary changes to Coot and MMDB. The great advantage of these changes is that for the large majority of PDB files, nothing will change.
Although PDB files will be with us for many years to come, there is the separate problem of depositing reflection data, for which MMCIF formats on the lines indicated by Kim would be a good solution. At the moment people who have refined a non-merohedrally twinned structure with SHELXL deposit the .fcf file (CIF, not MMCIF!). The data in this file have been 'detwinned' with the help of the refined structure and so the file can be read directly into Coot or used to calculate structure factors as if the crystal had not been twinned. Although this will still be needed, we should also be depositing the data as measured, so that the twinned refinement can be repeated or verified with another program. A similar situation arises with time-of-flight neutron diffraction data, e.g. from one of the new spallation sources. Incidentally the small molecule CIF format also has no answer yet to this problem, maybe the same solution could be found for both MMCIF and CIF?! Best wishes, George Prof. George M. Sheldrick FRS Dept. Structural Chemistry, University of Goettingen, Tammannstr. 4, D37077 Goettingen, Germany Tel. +49-551-39-3021 or -3068 Fax. +49-551-39-2582 On Wed, 8 Aug 2007, Ralf W. Grosse-Kunstleve wrote: > Hi George, > It seems to me that column 21 of an ATOM or HETATM instruction > is always blank, and column 22 is the chain ID. So if we put a > two-character chain ID right justified in columns 21 and 22, for the > vast majority of structures there would be no change, and it would > be relatively easy to change MMDB, Coot etc. to accomodate the > increase in the possible number of chains from 26 to (say) 36^2 = > 1296 (if digits are allowed too). I really like this idea and I will make phenix/cctbx work this way. > The next problem is of course the 5-digit atom serial number in columns > 7 to 11, which limits the total number of atoms in the structure to a > paltry 99999. Many programs ignore this number, but it is used by the > CONECT, SSBOND and CISPEP records in the PDB. However column 12 also > appears to be blank, I think that using it (e.g. with A for atoms > 100000 to 199999, B for atoms 200000 to 299999 etc.) would enable the > sequence numbers to be recycled and would again require no change for > the vast majority of PDB files. I personally think that this is a much > better solution than what the PDB currently does for more than 99999 > atoms (they spread the structure over several PDB files with different > PDB-IDs!). The solution to this problem is to simply treat the serial numbers and residue numbers as strings. X-PLOR/CNS has been doing this forever, maybe other programs, too. Implementations to generate intuitive, maximally backward compatible numbers can be found here: http://cci.lbl.gov/hybrid_36/ This includes a Fortran-77 implementation without any external dependencies, heavily tested with a large variety of compilers. I think these simple tricks will be sufficient until we are all retired! Cheers, Ralf ____________________________________________________________________________________ Be a better Globetrotter. Get better travel answers from someone who knows. Yahoo! Answers - Check it out. http://answers.yahoo.com/dir/?link=list&sid=396545469