I'm on the fence. Usually I'd say avoid extensions by default, but this
looks like a common extension that preserves information. But I can't
really say as I don't deal with PDB files. But if you do go ahead with
this, it's important that there is an option to turn it off in case it does
cause problems for anyone.

On Mon, 29 Jul 2019 at 22:01, David Koes <dk...@pitt.edu> wrote:

> Thanks Geoff - I've been meaning to write to the list about this, but
> haven't made the time.  But I'll make the time now
> since there are other facets of this pull request that deserve discussion.
>
> My goal in putting the pull request together was to make it so that
> openbabel doesn't break molecules.  Many of us use
> openbabel extensively to convert between formats as part of our workflows,
> and I think it is eminently reasonable to
> expect the molecule that comes out is the same as the one you put in.
>
> Towards this end, I put together a set of 13,997 ligands extracted from
> the PDB (getting the SDF file, which is more
> nicely processed than the raw PDB).  I then wrote round-tripping code that
> makes sure the molecule stays the same when
> converting between file formats.  There were initially thousands of
> failures and now there are none.  You can see the
> pull request for the various minor bug fixes (particularly to bond
> typing), but the main thing was adding additional
> information to file formats.
>
> PDB
> By default bond orders are now output using multiple CONECT records.
> There is an option to revert to single CONECT
> records.  I don't think this is technically a violation of the standard
> and preserves essential information.  It's also
> a convention that several other packages (e.g. RDKit) use.
>
> Mol2
> Formal charges are output using UNIT_ATOM_ATTR records.  This is part of
> the standard and there is an option to disable
> this output.
>
> SDF
> Reading the CTFile Formats document (
> https://www.daylight.com/meetings/mug05/Kappler/ctfile.pdf) I do not see
> how
> openbabel's interpretation of the atom alias record is correct.  openbabel
> treats it like a group abbreviation or
> substitution (the atom is replaced by the functional group named in the
> alias record).  But it really seems like it
> should just be specifying an alternate name for the atom, and this is
> exactly what SDF files from the PDB use it for.
> Treating the atom name as a function group replacement causes serious
> issues with the atom name happens to equal a
> functional group name.  Nonetheless, to maintain as much backwards
> compatibility as possible, I've made it so that
> replacement still happens if the atom is a wildcard type.  I have not
> included an option for the old behavior since I
> see no evidence it is standards compliant.
>
> Overall, there are some breaks from previous behavior, but the 3.0
> release, which is breaking backwards compatibility in
> multiple ways, is the ideal time to implement these relatively minor
> changes.  I don't think you will ever be able to
> fully trust openbabel converting between popular formats without changes
> like these (bond order and charge information
> is too important).
>
> Thanks,
>
> David Koes
>
> Assistant Professor
> Computational & Systems Biology
> University of Pittsburgh
>
> On 7/29/19 4:11 PM, Geoffrey Hutchison wrote:
> > David Koes has contributed a pull request that fixes a bunch of file
> handling errors via round-trip testing.
> >
> > One thing he's implemented is to bring back writing PDB files with
> multiple bond orders via repeated CONECT records.
> >
> > Before I consider the rest of the patch, I want to know opinions on
> this. It's been a few years since OB reverted to "standard" PDB output and
> omitted only one CONECT per bond connection. This matches the practice of
> the PDB itself, although the "bond order" practice is fairly wide-spread.
> >
> > David feels strongly that OB should default to write files with bond
> order information.
> >
> > I can't find the threads of previous discussion, which suggests some of
> it was in the SourceForge bug tracker - I can't find anything pro or con
> from that discussion, only that it was eventually decided to stick with one
> CONECT per bond.
> > - Do you favor single CONECT records?
> > - Do you favor bond order information?
> > - What do you feel are pro or cons?
> >
> > I'd like an open discussion. If it's possible, I know many people expect
> PDB files to support bond orders.
> >
> > -Geoff
> >
>
>
> _______________________________________________
> OpenBabel-Devel mailing list
> openbabel-de...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/openbabel-devel
>
_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Reply via email to