Ethan Merritt wrote: > Examples include: > - very large structures, for which the current 80 column PDB format > runs out of space for atom numbers (4 columns -> max 9999) > or for chain ids (1 column -> single char A-Z 0-9) > [don't ask my why they don't want lower case] > - new classes of experiment (SAXS, EM) > - new classes of model (TLS or normal-mode displacements, > ensemble models, envelope representations) It would be trivial to update the PDB format to handle large structures. In fact, such extensions are already being planned. Atom numbers can simply be handled by truncating them; the serial design of PDB files makes it redundant.
As for other experiments, like SAX or EM, I only think that the PDB format should continue to be used for atomic coordinates. Using them as a complete data reference has never been good. ... > Currently-maintained programs should move to mmCIF or XML, whichever > is convenient. These formats are intrinsically open-ended, and can > handle the problematic structures mentioned above so long as the > corresponding mmCIF dictionaries are updated to define the relevant > entities. Being intrinsically open-ended is an advantage for parsing, but it still takes a lot of work to actually make use of new data. The software still has to be updated to handle the data. Formats like mmCIF and XML only handle part of the 'file format' issue. One problem is that mmCIF can be too open-ended, depending on how the schema is managed. I would be much more willing to work toward switching to mmCIF if RCSB showed more interest in collaborating with the user community. If we can't even get involvement in something as simple as the PDB format, why should we think working with mmCIF will be any better? Joe Krahn