[Open Babel] state of OB features

Andrew Dalke Mon, 07 Jun 2010 22:15:17 -0700

Hi all,

  I've been asked by a company to evaluate different toolkits for them to use 
in-house. Geoff thought it might be better to ask on the list than ask him 
directly and privately, so I'm doing that.


  I've gone through their internal requirements now I'm seeing how it matched 
up to OB. Some of my conclusions are likely wrong and based on older version of 
OB, so I'm hoping people here can correct me.

  I'll apologize in advance that I won't be that accessible over the next 
couple of days, and likely won't be able to respond until Saturday or so. But I 
felt it was best to get it out now rather than wait.

  =====

  As with all cheminformatics toolkits, OB has ways to get access to the atoms 
and bonds of a molecule. It support molecular editing, so that atoms and bonds 
can be added, deleted, or modified as desired. Atoms, bonds, and molecules may 
have additional user-defined data associated with them.

  OB supports coordinates as part of the molecule (meaning that deleting the 
atom deletes the associated coordinates), and it supports multiple conformer 
structures.

  OB follows the Daylight approach where it has a standard chemistry model and 
all input structures are reperceived based on that model. There is no way to 
disable that option.

  OB is the foremost program for structure format support and interconversion.

  SD file support is complete for the v2000 and v3000, both for reading and 
writing. What I'm not sure about is the level of support for v3000. It's mostly 
support for the chemistry which is in v2000 but expressed differently in v3000, 
and for support for more than 999 atoms?

  SMILES support is good, although it doesn't have support for stereochemistry 
around double bonds. Excepting this lack, canonicalization is also good and 
widely used.

  OB does have PDB file support. I can't tell how good the chemistry perception 
is. For example, can it detect that a C-C bond is a double or triple bond 
instead of a single (eg, by looking at the bond length, or by understanding the 
residue names)?

  While OB does have a nearly uniform reader API (ie, I can point it to an SD 
file, SMILES file, etc and get molecules), and built-in gzip support, I do have 
to specify the format type manually. That is, there's no support for guessing 
the format based, for example, on the extension.

  OpenBabel has SMARTS support, but I can't tell how complete it is. I know it 
doesn't support double bond stereochemistry, but I think it's otherwise 
complete, including recursive SMARTS. Is there anything missing?

  OB also supports using a molecule as the query rather than a SMARTS.

  Once the match is made, it's easy to get access to the matched atoms and 
bonds, and match them up to the corresponding query atoms and bonds.

  The topic I know the least about is reactions. OB supports reaction SMILES 
and SMARTS, as well as RXN files. I don't have a good idea for how good that 
support is, and it's not something I used much, although my client does.

  In addition to the support for the query languages/formats, I can't tell how 
to use the reactions. How would I do a unimolecular reaction (eg, convert all 
of the carbons in CCCN to OOON)? How would I use a reaction for library 
generation (eg, convert CCC to first OCCN, then COCN, and lastly CCON)? Is it 
even possible? I looked but didn't find it.

  OB does support some fingerprints. There's a linear hash fingerprint similar 
to Daylight's and two feature fingerprint implementations, although only one is 
suggested. There's no MACCS key implementation. There is no support for 
large/sparse fingerprints, and the only implemented comparison method is the 
Tanimoto similarity.

 OB does not do depiction. For that case people should turn to other libraries, 
such as OASA.

 There's no MCS or scaffold identification code in OB. There is a descriptor 
framework system, support for different forcefields and minimization, and InChI 
support. There's no nomenclature support.

  OB is cross-platform (here meaning "Windows and Linux"), with access to the 
library from C++, Python, .Net and Java. The documentation is incomplete and 
sketchy, but because OB is used by a large number of people, there is support 
both through the mailing list and by doing a web search for others who have 
used the code.

  I have a metric for testing usability, and that's the number of lines of code 
needed to count the total number of atoms of all of the records in an input 
file, using one toolkit vs. pybel. OpenBabel suffers because of the overhead of 
creating an OBConversion.

  I have another metric for comparing error handling, which is to read an SD 
file with records containing errors (format errors and chemistry errors) and 
seeing if I can find the number of records which failed to be read in and the 
reason for the failure. I haven't figured how out to do that with OB.

  ====

Thanks in advance!

                                Andrew
                                da...@dalkescientific.com



------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

[Open Babel] state of OB features

Reply via email to