Re: [Open Babel] state of OB features

Geoffrey Hutchison Tue, 08 Jun 2010 05:32:37 -0700

Chris and Noel have made comments already, and I generally agree with them. I 
have only added comments where I felt it was needed.


>  SMILES support is good, although it doesn't have support for stereochemistry 
> around double bonds. Excepting this lack, canonicalization is also good and 
> widely used.

As Noel said, we do have support for stereochemistry around double bonds in 
SMILES. Stereochemistry is much improved thanks to Noel and Tim Vandermeersch 
in the soon-to-be-releasesd v2.3. (SMARTS support for double-bond stereo is 
another matter.)

>  OB does have PDB file support. I can't tell how good the chemistry 
> perception is. For example, can it detect that a C-C bond is a double or 
> triple bond instead of a single (eg, by looking at the bond length, or by 
> understanding the residue names)?

Yes. On any file format like PDB or XYZ which does not support bond types, 
perception is run to determine connectivity and bond order. For PDB, this is 
also done first via residue names. Bond perception can also be turned off via 
the command-line or programmatically (e.g., some users run MD simulations and 
have their own topology file).

>  While OB does have a nearly uniform reader API (ie, I can point it to an SD 
> file, SMILES file, etc and get molecules), and built-in gzip support, I do 
> have to specify the format type manually. That is, there's no support for 
> guessing the format based, for example, on the extension.

Less than 1% of the time do I have to specify a format type manually. Formats 
can be guessed from file extensions, and for some file types (e.g., quantum 
packages that like the .out, .log, or .dat extensions), OB will attempt to 
guess the format from contents.

Certainly common extensions like .pdb, .mol, .mol2, .sdf, .sdf.gz, .pdb.gz, are 
all recognized.

>  OB also supports using a molecule as the query rather than a SMARTS.

Well, you can output a SMILES from a molecule and use that as a SMARTS. That's 
a unit test, so we can guarantee that always works. As Chris said, there's also 
the fastsearch format.

>  In addition to the support for the query languages/formats, I can't tell how 
> to use the reactions. How would I do a unimolecular reaction (eg, convert all 
> of the carbons in CCCN to OOON)? How would I use a reaction for library 
> generation (eg, convert CCC to first OCCN, then COCN, and lastly CCON)? Is it 
> even possible? I looked but didn't find it.

It's not currently exposed to users, but the OBChemTsfm class is used to handle 
pH-dependent protonation. It can handle this task too. The syntax is basically 
reaction SMILES.

> OB does not do depiction. For that case people should turn to other 
> libraries, such as OASA.

As Chris said, there is depiction in v2.3. It's evidently solid enough that 
Craig uses it for a service on eMolecules.com.

>  OB is cross-platform (here meaning "Windows and Linux"), with access to the 
> library from C++, Python, .Net and Java. The documentation is incomplete and 
> sketchy, but because OB is used by a large number of people, there is support 
> both through the mailing list and by doing a web search for others who have 
> used the code.

We're always open to feedback about areas of documentation needing 
clarification. Telling us it's sketchy and/or incomplete doesn't help much. 
Pointers to areas needing improvement will be met with applause (and fixes).

>  I have a metric for testing usability, and that's the number of lines of 
> code needed to count the total number of atoms of all of the records in an 
> input file, using one toolkit vs. pybel. OpenBabel suffers because of the 
> overhead of creating an OBConversion.

As Noel mentioned, we *are* pybel. So I think we win that comparison. Yes, the 
C++ interface is slightly more verbose, but that's also true of C++ versus 
Python in general.

>  I have another metric for comparing error handling, which is to read an SD 
> file with records containing errors (format errors and chemistry errors) and 
> seeing if I can find the number of records which failed to be read in and the 
> reason for the failure. I haven't figured how out to do that with OB.

We keep an audit log. From the command-line you get a summary:

[ghutc...@iridium]: babel tpy-Ru.sdf tpy.mol2
1 molecule converted
1 info messages 23 audit log messages 

You can programmatically interrogate the error log to get the warnings, 
severity level, etc. The audit level is intended to cover any code which may 
change chemical interpretation (e.g., Kekulization, adding implicit hydrogens, 
bond perception, etc.).

Hope that helps,
-Geoff
------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Re: [Open Babel] state of OB features

Reply via email to