[ccp4bb] PDB format survey?

Santarsiero, Bernard D. Fri, 10 Aug 2007 08:17:22 -0700

Can I ask a dumb question? Just curious...

Why are we now limited to 80 "columns"? In the old days, that was a limit
with Fortran and punched cards. Can a "record" (whatever it's called now)
be as long as we wish? Instead of compressing a lot on a PDB record line,
can we lengthen it to 130 columns?



Bernie Santarsiero


On Fri, August 10, 2007 10:10 am, Warren DeLano wrote:
> Correction:  Scratch what I wrote -- the PDB format does now support a
> formal charge field in columns 79-80 (1+,2+,1- etc.).  Hooray!
>
> Thus, adoption of the CONECT valency convention is all it would take for
> us to be able to convey chemically-defined structures using the PDB
> format.
>
> I'll happily add two-letter chain IDS and hybrid36 to PyMOL but would
> really, really like to see valences included as well -- widespread
> adoption of that simple convention would represent a major practical
> advance for interoperability in structure-based drug discovery.
>
> Cheers,
> Warren
>
>
>> -----Original Message-----
>> From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On
>> Behalf Of Warren DeLano
>> Sent: Thursday, August 09, 2007 5:53 PM
>> To: CCP4BB@JISCMAIL.AC.UK
>> Subject: Re: [ccp4bb] PDB format survey?
>>
>> Joe,
>>
>> I feel that atom serial numbers are particularly important,
>> since they, combined with CONECT records, provide the only
>> semi-standard convention I know of for reliably encoding bond
>> valences information into a PDB file.
>>
>> single bond = bond listed once
>> double bond = bond listed twice
>> triple bond = bond listed thrice
>> aromatic bond = bond listed four times.
>>
>> This is a convention long supported by tools like MacroModel
>> and PyMOL.
>> For example, here is formaldehyde, where the bond between
>> atoms 1 and 3 is listed twice:
>>
>> HETATM    1  C01 C=O     1       0.000  -0.020   0.000  0.00  0.00
>> C
>> HETATM    2  N01 C=O     1       1.268  -0.765   0.000  0.00  0.00
>> N
>> HETATM    3  O02 C=O     1       0.000   1.188   0.000  0.00  0.00
>> O
>> HETATM    4  H01 C=O     1       1.260  -1.775   0.000  0.00  0.00
>> H
>> HETATM    5  H02 C=O     1       2.146  -0.266   0.000  0.00  0.00
>> H
>> HETATM    6  H03 C=O     1      -0.946  -0.562   0.000  0.00  0.00
>> H
>> CONECT    1    2
>> CONECT    1    3
>> CONECT    1    3
>> CONECT    1    6
>> CONECT    2    1    4    5
>> CONECT    3    1
>> CONECT    3    1
>> CONECT    4    2
>> CONECT    5    2
>> CONECT    6    1
>>
>> I second the proposal of treating this field as a unique
>> string rather than a numeric quantity.
>>
>> Two letter chain IDs would be fine with me, but I do think we
>> could also make better use of SEGI and/or MODEL to break
>> things up while still preserving the utility of certain other
>> records (SHEET, HELIX, etc.) within their existing column definitions.
>>
>> However, we are still lacking a standard way of designating
>> formal charges, So maybe that free column could be better
>> used for encoding a formal charge, such as ["q" "t", "d",
>> "-", "+", "D", "T", "Q"] over the formal charge range of
>> [-4,-3,-2,-1,0,1,2,3,4] -- just an idea :)...
>>
>> With valences plus formal charges along with expansion of the
>> cap on atom counts, I think we could support
>> chemically-complete PDB files and push back the date of PDB
>> demise for a few more years!
>>
>> A Wiki dedicated to practical PDB file hacks and extensions
>> is a superb idea -- of course, the goal should be to
>> ultimately come up with a single well-defined standard set of
>> hacks we all agree upon by supporting them in our code.
>>
>> Cheers,
>> Warren
>>
>> -----Original Message-----
>> From: CCP4 bulletin board [mailto:[EMAIL PROTECTED] On
>> Behalf Of Joe Krahn
>> Sent: Thursday, August 09, 2007 1:15 PM
>> To: CCP4BB@JISCMAIL.AC.UK
>> Subject: Re: [ccp4bb] PDB format survey?
>>
>> Edward A. Berry wrote:
>> > Ethan A Merritt wrote:
>> >> On Wednesday 08 August 2007 20:47, Ralf W. Grosse-Kunstleve wrote:
>> >>> Implementations to generate intuitive, maximally backward
>> compatible
>> >>> numbers can be found here:
>> >>>
>> >>>   http://cci.lbl.gov/hybrid_36/
>> >>
>> >> From that URL:
>> >>
>> >> ATOM  99998  SD  MET L9999      48.231 -64.383  -9.257  1.00
>> >> 11.54           S
>> >> ATOM  99999  CE  MET L9999      49.398 -63.242 -10.211  1.00
>> >> 14.60           C
>> >> ATOM  A0000  N   VAL LA000      52.228 -67.689 -12.196  1.00
>> >> 8.76           N
>> >> ATOM  A0001  CA  VAL LA000      53.657 -67.774 -12.458  1.00
>> >> 3.40           C
>> >>
>> >> Could you please clarify this example?
>> >> Is that "A0000" a hexidecimal number, or is it a decimal
>> number that
>> >> just happens to have an "A" in front of it?
>> >> [A-Z][0-9999] gives a larger range of values than 5 bytes of
>> hexadecimal,
>> >> so I'm guessing it's the former.  But the example is not clear.
>> >>
>> > I'm guessing the former also. A 5-digit hex number would not be
>> > backwards compatible. With this system legacy programs can
>> still read
>> > the files with 99999 atoms or less, and anything more than
>> that they
>> > couldn't have handled anyway. Very nice!
>> >
>> > Ed
>> I still prefer the idea of just truncating serial numbers,
>> and using an alternative to CONECT for large structures.
>> Almost nobody uses atomSerial, but it still may be parsed as
>> an integer, where the above idea could cause errors.
>> Furthermore, non-digit encoding still results in another
>> maximum, whereas truncating the numbers has no limit. The
>> truncated serial number is ambiguous only if taken out of
>> context of the
>>
>> complete PDB file, but PDB files are by design sequential.
>>
>> Another alternative is to define an "atom-serial offset"
>> record. It can define a number which is added to all
>> subsequently parsed atom serial numbers. Every ATOM/HETATM
>> record is then perfectly valid to an older program, but may
>> only be able to handle one chunk of atoms at once.
>>
>> Likewise, I like the idea of a ChainID map record, which maps
>> single-letter chainID's to larger named ID's. Each existing
>> PDB record can then be used unchanged, but files can then
>> support very long ChainID
>>
>> strings. The only disadvantage is that old PDB readers will
>> get confused, but at least the individual record formats are
>> not changed in a way that makes them crash.
>>
>> I think that keeping the old record definitions completely
>> unchanged are
>>
>> an important feature to any PDB format revisions. Even if we
>> continue to
>>
>> use it for another 20 years, it's primary advantage is that
>> it is a well-established "legacy" format. If we change
>> existing records, we break that one useful feature.
>> Therefore, I think that any changes to existing records
>> should be limited to using characters positions that are
>> currently. (The one exception is that we need to make the HEADER Y2K
>>
>> compatible by using a 4-digit year, which means the existing
>> decade+year
>>
>> characters have to be moved.)
>>
>> Of course, the more important issue is that the final
>> decision needs community involvement, and not just a decision
>> by a small group of RCSB or wwPDB administrators.
>>
>> Maybe it would be useful to set up a PDB format "Wiki" where
>> alternatives can be defined, along with advantages and
>> disadvantages. If
>>
>> there was sufficient agreement, it could be used as a
>> community tool to put together a draft revision of the next
>> PDB format. With any luck, some RCSB or wwPDB people would
>> participate as well.
>>
>> Joe Krahn
>>
>>
>>
>>
>

[ccp4bb] PDB format survey?

Reply via email to