Hi all, I've been asked by a company to evaluate different toolkits for them to use in-house. Geoff thought it might be better to ask on the list than ask him directly and privately, so I'm doing that.
I've gone through their internal requirements now I'm seeing how it matched up to OB. Some of my conclusions are likely wrong and based on older version of OB, so I'm hoping people here can correct me. I'll apologize in advance that I won't be that accessible over the next couple of days, and likely won't be able to respond until Saturday or so. But I felt it was best to get it out now rather than wait. ===== As with all cheminformatics toolkits, OB has ways to get access to the atoms and bonds of a molecule. It support molecular editing, so that atoms and bonds can be added, deleted, or modified as desired. Atoms, bonds, and molecules may have additional user-defined data associated with them. OB supports coordinates as part of the molecule (meaning that deleting the atom deletes the associated coordinates), and it supports multiple conformer structures. OB follows the Daylight approach where it has a standard chemistry model and all input structures are reperceived based on that model. There is no way to disable that option. OB is the foremost program for structure format support and interconversion. SD file support is complete for the v2000 and v3000, both for reading and writing. What I'm not sure about is the level of support for v3000. It's mostly support for the chemistry which is in v2000 but expressed differently in v3000, and for support for more than 999 atoms? SMILES support is good, although it doesn't have support for stereochemistry around double bonds. Excepting this lack, canonicalization is also good and widely used. OB does have PDB file support. I can't tell how good the chemistry perception is. For example, can it detect that a C-C bond is a double or triple bond instead of a single (eg, by looking at the bond length, or by understanding the residue names)? While OB does have a nearly uniform reader API (ie, I can point it to an SD file, SMILES file, etc and get molecules), and built-in gzip support, I do have to specify the format type manually. That is, there's no support for guessing the format based, for example, on the extension. OpenBabel has SMARTS support, but I can't tell how complete it is. I know it doesn't support double bond stereochemistry, but I think it's otherwise complete, including recursive SMARTS. Is there anything missing? OB also supports using a molecule as the query rather than a SMARTS. Once the match is made, it's easy to get access to the matched atoms and bonds, and match them up to the corresponding query atoms and bonds. The topic I know the least about is reactions. OB supports reaction SMILES and SMARTS, as well as RXN files. I don't have a good idea for how good that support is, and it's not something I used much, although my client does. In addition to the support for the query languages/formats, I can't tell how to use the reactions. How would I do a unimolecular reaction (eg, convert all of the carbons in CCCN to OOON)? How would I use a reaction for library generation (eg, convert CCC to first OCCN, then COCN, and lastly CCON)? Is it even possible? I looked but didn't find it. OB does support some fingerprints. There's a linear hash fingerprint similar to Daylight's and two feature fingerprint implementations, although only one is suggested. There's no MACCS key implementation. There is no support for large/sparse fingerprints, and the only implemented comparison method is the Tanimoto similarity. OB does not do depiction. For that case people should turn to other libraries, such as OASA. There's no MCS or scaffold identification code in OB. There is a descriptor framework system, support for different forcefields and minimization, and InChI support. There's no nomenclature support. OB is cross-platform (here meaning "Windows and Linux"), with access to the library from C++, Python, .Net and Java. The documentation is incomplete and sketchy, but because OB is used by a large number of people, there is support both through the mailing list and by doing a web search for others who have used the code. I have a metric for testing usability, and that's the number of lines of code needed to count the total number of atoms of all of the records in an input file, using one toolkit vs. pybel. OpenBabel suffers because of the overhead of creating an OBConversion. I have another metric for comparing error handling, which is to read an SD file with records containing errors (format errors and chemistry errors) and seeing if I can find the number of records which failed to be read in and the reason for the failure. I haven't figured how out to do that with OB. ==== Thanks in advance! Andrew da...@dalkescientific.com ------------------------------------------------------------------------------ ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo _______________________________________________ OpenBabel-discuss mailing list OpenBabel-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-discuss