Hi Joe, as both a contributor to the data and a user of the PDB files, I think you are a bit harsh in the general extrapolation of your frustration with PDB. I have communicated with the PDB folks both at Rutgers and at EBI extensively about many of the shortcomings in both deposition and data mining you are describing now. I have been annoyed at times too, but many things have in fact been fixed.
But there is, as you note, a very diverse user community, and all have different opinions of what is important to them and what not. The PDB has been listening to user comments and some can be addressed readily, some would cause clashes with many other users' views. On top of it, you have to deal with a very sensitive community of submitters (imagine what outcry would happen if the PDB would do some really strict validation and add the caustic remarks that some structures deserve to the PDB file). One result would be a lot less nature papers, btw.... The developers are an equally important and opinion-rich group, and by changing anything format related, you generally break quite a few things for them. As they are key for the science (no programs -> no structures -> no nagging bioinformaticists) they need to be part of any view of future developments. A general rethinking of the representation of structural information beyond the format discussion, on an abstract object level is certainly advisable for the future. Extensions such as mesh or grid objects can capture much from e-density to EM masks to SAX obloids and shapes, and go beyond the beloved flat-file atom representation. Anyhow, if I understand correctly, your suggestion of polling the user community, is on an grander scale already on the mind of the PDB folks. I am sure your voice will be heard. Best regards, BR ----------------------------------------------------------------- Bernhard Rupp ACA Data Standards and Computing Committee 001 (925) 209-7429 +43 (676) 571-0536 [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.ruppweb.org/ ----------------------------------------------------------------- People can be divided in three classes: The few who make things happen The many who watch things happen And the overwhelming majority who have no idea what is happening. ----------------------------------------------------------------- -----Original Message----- From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On Behalf Of Joe Krahn Sent: Wednesday, August 01, 2007 5:17 PM To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] PDB format survey? Ethan Merritt wrote: > Examples include: > - very large structures, for which the current 80 column PDB format > runs out of space for atom numbers (4 columns -> max 9999) > or for chain ids (1 column -> single char A-Z 0-9) > [don't ask my why they don't want lower case] > - new classes of experiment (SAXS, EM) > - new classes of model (TLS or normal-mode displacements, > ensemble models, envelope representations) It would be trivial to update the PDB format to handle large structures. In fact, such extensions are already being planned. Atom numbers can simply be handled by truncating them; the serial design of PDB files makes it redundant. As for other experiments, like SAX or EM, I only think that the PDB format should continue to be used for atomic coordinates. Using them as a complete data reference has never been good. ... > Currently-maintained programs should move to mmCIF or XML, whichever > is convenient. These formats are intrinsically open-ended, and can > handle the problematic structures mentioned above so long as the > corresponding mmCIF dictionaries are updated to define the relevant > entities. Being intrinsically open-ended is an advantage for parsing, but it still takes a lot of work to actually make use of new data. The software still has to be updated to handle the data. Formats like mmCIF and XML only handle part of the 'file format' issue. One problem is that mmCIF can be too open-ended, depending on how the schema is managed. I would be much more willing to work toward switching to mmCIF if RCSB showed more interest in collaborating with the user community. If we can't even get involvement in something as simple as the PDB format, why should we think working with mmCIF will be any better? Joe Krahn