Re: [Open Babel] state of OB features

Andrew Dalke Tue, 08 Jun 2010 09:04:43 -0700

On Jun 8, 2010, at 6:31 AM, Geoffrey Hutchison wrote:
> As Noel said, we do have support for stereochemistry around double bonds in 
> SMILES. Stereochemistry is much improved thanks to Noel and Tim Vandermeersch 
> in the soon-to-be-releasesd v2.3. (SMARTS support for double-bond stereo is 
> another matter.)


Is there an expected date for that? If it's within the next couple of weeks 
than I can put in the 2.3 information.

> Yes. On any file format like PDB or XYZ which does not support bond types, 
> perception is run to determine connectivity and bond order. For PDB, this is 
> also done first via residue names. Bond perception can also be turned off via 
> the command-line or programmatically (e.g., some users run MD simulations and 
> have their own topology file).

Where is this documented? Searching for "PDB perception" on the OpenBabel site 
doesn't find much of anything about the algorithm used. My own history of 
working with the PDB says it's a lot of work to get those details, and a quick 
look at the OEChem release notes has things like:

 - Added PDB support for the following:
   - sidechain recognition for the RNA residue ‘YG’ and ‘H2U’
   - naming of PDB residue ‘BME’
   - the N-terminal modification ‘FOR’
   - the cofactor ‘FMT’ (which is “formic acid” or “formate”)

but I don't see mention of the quality/robustness of the PDB reader in OB.


> Less than 1% of the time do I have to specify a format type manually. Formats 
> can be guessed from file extensions, and for some file types (e.g., quantum 
> packages that like the .out, .log, or .dat extensions), OB will attempt to 
> guess the format from contents.


The examples I've seen are all like

====== straight openbabel
import openbabel as ob
 
obconversion = ob.OBConversion()
obconversion.SetInFormat("sdf")

 
obmol = ob.OBMol()

notatend = obconversion.ReadFile(obmol, "benzodiazepine.sdf.gz")
while notatend:
    ...
    notatend = obconversion.Read(obmol)


==== pybel

import  pybel
 

for mol in pybel.readfile("sdf", "benzodiazepine.sdf.gz"):
    ...

===


where the format is explicitly specified. The documentation at

http://openbabel.org/dev-api/classOpenBabel_1_1OBConversion.shtml
under "To add automatic format conversion to an existing program."

uses

      ifstream ifs(filename); //Original code
      OBConversion conv;
      OBFormat* inFormat = conv.FormatFromExt(filename);
      OBFormat* outFormat = conv.GetFormat("ORIG");
      istream* pIn = &ifs; 
      stringstream newstream;
      if (inFormat && outFormat)
      {
         conv.SetInAndOutFormats(inFormat,outFormat);
         conv.Convert(pIn,&newstream);
         pIn=&newstream;
      }

which allows automatic format detection based on the extension, but it's a lot 
of boilerplate code.

I didn't realize I could leave the format name out and the code would 
autodetect.

>> OB also supports using a molecule as the query rather than a SMARTS.
> 
> Well, you can output a SMILES from a molecule and use that as a SMARTS. 
> That's a unit test, so we can guarantee that always works. As Chris said, 
> there's also the fastsearch format.

Perhaps that's what I was looking at. I'll have to dig into that again.

> It's not currently exposed to users, but the OBChemTsfm class is used to 
> handle pH-dependent protonation. It can handle this task too. The syntax is 
> basically reaction SMILES.

I'm assuming this will be exposed when someone volunteers to do it? ;)

> We're always open to feedback about areas of documentation needing 
> clarification. Telling us it's sketchy and/or incomplete doesn't help much. 
> Pointers to areas needing improvement will be met with applause (and fixes).

I thought it was pretty clear that the documentation in OpenBabel was sketchy, 
in comparison to some of other toolkits, like OEChem or ChemAxon. I've 
mentioned a few of these places in this and my other responses.


> As Noel mentioned, we *are* pybel. So I think we win that comparison. Yes, 
> the C++ interface is slightly more verbose, but that's also true of C++ 
> versus Python in general.

I've been thinking about this since replying to Noel's email. OpenBabel 
publishes two different Python APIs - the one with the C++ interface and the 
Pybel interface.

The Zen of Python includes

  There should be one-- and preferably only one --obvious way to do it.

Is pybel the preferred way to do things on the Python level?


> We keep an audit log. From the command-line you get a summary:
> 
> [ghutc...@iridium]: babel tpy-Ru.sdf tpy.mol2
> 1 molecule converted
> 1 info messages 23 audit log messages 
> 
> You can programmatically interrogate the error log to get the warnings, 
> severity level, etc. The audit level is intended to cover any code which may 
> change chemical interpretation (e.g., Kekulization, adding implicit 
> hydrogens, bond perception, etc.).

That's also what OpenEye does, but getting access to the error log, 
synchronized with the reader, is nasty hard. Can someone show me how to get 
that? For example, if Pybel is the preferred way to get this data, then how do 
I get the error logs for each molecule in

for mol in pybel.readfile("sdf", "benzodiazepine.sdf.gz"):

 ?


> Hope that helps,

It does. Thanks!


                                Andrew
                                da...@dalkescientific.com



------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Re: [Open Babel] state of OB features

Reply via email to