Can I ask a dumb question? Just curious... Why are we now limited to 80 "columns"? In the old days, that was a limit with Fortran and punched cards. Can a "record" (whatever it's called now) be as long as we wish? Instead of compressing a lot on a PDB record line, can we lengthen it to 130 columns?
Bernie Santarsiero On Fri, August 10, 2007 10:10 am, Warren DeLano wrote: > Correction: Scratch what I wrote -- the PDB format does now support a > formal charge field in columns 79-80 (1+,2+,1- etc.). Hooray! > > Thus, adoption of the CONECT valency convention is all it would take for > us to be able to convey chemically-defined structures using the PDB > format. > > I'll happily add two-letter chain IDS and hybrid36 to PyMOL but would > really, really like to see valences included as well -- widespread > adoption of that simple convention would represent a major practical > advance for interoperability in structure-based drug discovery. > > Cheers, > Warren > > >> -----Original Message----- >> From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On >> Behalf Of Warren DeLano >> Sent: Thursday, August 09, 2007 5:53 PM >> To: CCP4BB@JISCMAIL.AC.UK >> Subject: Re: [ccp4bb] PDB format survey? >> >> Joe, >> >> I feel that atom serial numbers are particularly important, >> since they, combined with CONECT records, provide the only >> semi-standard convention I know of for reliably encoding bond >> valences information into a PDB file. >> >> single bond = bond listed once >> double bond = bond listed twice >> triple bond = bond listed thrice >> aromatic bond = bond listed four times. >> >> This is a convention long supported by tools like MacroModel >> and PyMOL. >> For example, here is formaldehyde, where the bond between >> atoms 1 and 3 is listed twice: >> >> HETATM 1 C01 C=O 1 0.000 -0.020 0.000 0.00 0.00 >> C >> HETATM 2 N01 C=O 1 1.268 -0.765 0.000 0.00 0.00 >> N >> HETATM 3 O02 C=O 1 0.000 1.188 0.000 0.00 0.00 >> O >> HETATM 4 H01 C=O 1 1.260 -1.775 0.000 0.00 0.00 >> H >> HETATM 5 H02 C=O 1 2.146 -0.266 0.000 0.00 0.00 >> H >> HETATM 6 H03 C=O 1 -0.946 -0.562 0.000 0.00 0.00 >> H >> CONECT 1 2 >> CONECT 1 3 >> CONECT 1 3 >> CONECT 1 6 >> CONECT 2 1 4 5 >> CONECT 3 1 >> CONECT 3 1 >> CONECT 4 2 >> CONECT 5 2 >> CONECT 6 1 >> >> I second the proposal of treating this field as a unique >> string rather than a numeric quantity. >> >> Two letter chain IDs would be fine with me, but I do think we >> could also make better use of SEGI and/or MODEL to break >> things up while still preserving the utility of certain other >> records (SHEET, HELIX, etc.) within their existing column definitions. >> >> However, we are still lacking a standard way of designating >> formal charges, So maybe that free column could be better >> used for encoding a formal charge, such as ["q" "t", "d", >> "-", "+", "D", "T", "Q"] over the formal charge range of >> [-4,-3,-2,-1,0,1,2,3,4] -- just an idea :)... >> >> With valences plus formal charges along with expansion of the >> cap on atom counts, I think we could support >> chemically-complete PDB files and push back the date of PDB >> demise for a few more years! >> >> A Wiki dedicated to practical PDB file hacks and extensions >> is a superb idea -- of course, the goal should be to >> ultimately come up with a single well-defined standard set of >> hacks we all agree upon by supporting them in our code. >> >> Cheers, >> Warren >> >> -----Original Message----- >> From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On >> Behalf Of Joe Krahn >> Sent: Thursday, August 09, 2007 1:15 PM >> To: CCP4BB@JISCMAIL.AC.UK >> Subject: Re: [ccp4bb] PDB format survey? >> >> Edward A. Berry wrote: >> > Ethan A Merritt wrote: >> >> On Wednesday 08 August 2007 20:47, Ralf W. Grosse-Kunstleve wrote: >> >>> Implementations to generate intuitive, maximally backward >> compatible >> >>> numbers can be found here: >> >>> >> >>> http://cci.lbl.gov/hybrid_36/ >> >> >> >> From that URL: >> >> >> >> ATOM 99998 SD MET L9999 48.231 -64.383 -9.257 1.00 >> >> 11.54 S >> >> ATOM 99999 CE MET L9999 49.398 -63.242 -10.211 1.00 >> >> 14.60 C >> >> ATOM A0000 N VAL LA000 52.228 -67.689 -12.196 1.00 >> >> 8.76 N >> >> ATOM A0001 CA VAL LA000 53.657 -67.774 -12.458 1.00 >> >> 3.40 C >> >> >> >> Could you please clarify this example? >> >> Is that "A0000" a hexidecimal number, or is it a decimal >> number that >> >> just happens to have an "A" in front of it? >> >> [A-Z][0-9999] gives a larger range of values than 5 bytes of >> hexadecimal, >> >> so I'm guessing it's the former. But the example is not clear. >> >> >> > I'm guessing the former also. A 5-digit hex number would not be >> > backwards compatible. With this system legacy programs can >> still read >> > the files with 99999 atoms or less, and anything more than >> that they >> > couldn't have handled anyway. Very nice! >> > >> > Ed >> I still prefer the idea of just truncating serial numbers, >> and using an alternative to CONECT for large structures. >> Almost nobody uses atomSerial, but it still may be parsed as >> an integer, where the above idea could cause errors. >> Furthermore, non-digit encoding still results in another >> maximum, whereas truncating the numbers has no limit. The >> truncated serial number is ambiguous only if taken out of >> context of the >> >> complete PDB file, but PDB files are by design sequential. >> >> Another alternative is to define an "atom-serial offset" >> record. It can define a number which is added to all >> subsequently parsed atom serial numbers. Every ATOM/HETATM >> record is then perfectly valid to an older program, but may >> only be able to handle one chunk of atoms at once. >> >> Likewise, I like the idea of a ChainID map record, which maps >> single-letter chainID's to larger named ID's. Each existing >> PDB record can then be used unchanged, but files can then >> support very long ChainID >> >> strings. The only disadvantage is that old PDB readers will >> get confused, but at least the individual record formats are >> not changed in a way that makes them crash. >> >> I think that keeping the old record definitions completely >> unchanged are >> >> an important feature to any PDB format revisions. Even if we >> continue to >> >> use it for another 20 years, it's primary advantage is that >> it is a well-established "legacy" format. If we change >> existing records, we break that one useful feature. >> Therefore, I think that any changes to existing records >> should be limited to using characters positions that are >> currently. (The one exception is that we need to make the HEADER Y2K >> >> compatible by using a 4-digit year, which means the existing >> decade+year >> >> characters have to be moved.) >> >> Of course, the more important issue is that the final >> decision needs community involvement, and not just a decision >> by a small group of RCSB or wwPDB administrators. >> >> Maybe it would be useful to set up a PDB format "Wiki" where >> alternatives can be defined, along with advantages and >> disadvantages. If >> >> there was sufficient agreement, it could be used as a >> community tool to put together a draft revision of the next >> PDB format. With any luck, some RCSB or wwPDB people would >> participate as well. >> >> Joe Krahn >> >> >> >> >