Thanks Geoff - I've been meaning to write to the list about this, but haven't made the time. But I'll make the time now since there are other facets of this pull request that deserve discussion.

My goal in putting the pull request together was to make it so that openbabel doesn't break molecules. Many of us use openbabel extensively to convert between formats as part of our workflows, and I think it is eminently reasonable to expect the molecule that comes out is the same as the one you put in.

Towards this end, I put together a set of 13,997 ligands extracted from the PDB (getting the SDF file, which is more nicely processed than the raw PDB). I then wrote round-tripping code that makes sure the molecule stays the same when converting between file formats. There were initially thousands of failures and now there are none. You can see the pull request for the various minor bug fixes (particularly to bond typing), but the main thing was adding additional information to file formats.

PDB
By default bond orders are now output using multiple CONECT records. There is an option to revert to single CONECT records. I don't think this is technically a violation of the standard and preserves essential information. It's also a convention that several other packages (e.g. RDKit) use.

Mol2
Formal charges are output using UNIT_ATOM_ATTR records. This is part of the standard and there is an option to disable this output.

SDF
Reading the CTFile Formats document (https://www.daylight.com/meetings/mug05/Kappler/ctfile.pdf) I do not see how openbabel's interpretation of the atom alias record is correct. openbabel treats it like a group abbreviation or substitution (the atom is replaced by the functional group named in the alias record). But it really seems like it should just be specifying an alternate name for the atom, and this is exactly what SDF files from the PDB use it for. Treating the atom name as a function group replacement causes serious issues with the atom name happens to equal a functional group name. Nonetheless, to maintain as much backwards compatibility as possible, I've made it so that replacement still happens if the atom is a wildcard type. I have not included an option for the old behavior since I see no evidence it is standards compliant.

Overall, there are some breaks from previous behavior, but the 3.0 release, which is breaking backwards compatibility in multiple ways, is the ideal time to implement these relatively minor changes. I don't think you will ever be able to fully trust openbabel converting between popular formats without changes like these (bond order and charge information is too important).

Thanks,

David Koes

Assistant Professor
Computational & Systems Biology
University of Pittsburgh

On 7/29/19 4:11 PM, Geoffrey Hutchison wrote:
David Koes has contributed a pull request that fixes a bunch of file handling 
errors via round-trip testing.

One thing he's implemented is to bring back writing PDB files with multiple 
bond orders via repeated CONECT records.

Before I consider the rest of the patch, I want to know opinions on this. It's been a few years 
since OB reverted to "standard" PDB output and omitted only one CONECT per bond 
connection. This matches the practice of the PDB itself, although the "bond order" 
practice is fairly wide-spread.

David feels strongly that OB should default to write files with bond order 
information.

I can't find the threads of previous discussion, which suggests some of it was 
in the SourceForge bug tracker - I can't find anything pro or con from that 
discussion, only that it was eventually decided to stick with one CONECT per 
bond.
- Do you favor single CONECT records?
- Do you favor bond order information?
- What do you feel are pro or cons?

I'd like an open discussion. If it's possible, I know many people expect PDB 
files to support bond orders.

-Geoff



_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Reply via email to