Re: [Open Babel] Similarity search / Open Babel Warning in ParseSmiles

Chris Morley Thu, 11 Apr 2013 09:33:56 -0700

On 10/04/2013 16:55, Pascal Muller wrote:
> Dear all,
>
> I would like to find similar molecules within library and compute
> Tanimoto coefficient.
>
> I'm right by assuming that, using -at0.0, I should retrieve all molecules?
>
> obabel library.fs -Smol.smi -ofpt -at0.0
> (version 2.3.2)
>
> But I get only 1098 compounds out of a total of 1579, along with 262
> warning messages like:
>
> Open Babel Warning  in ParseSmiles
> Invalid SMILES string: 1 unmatched ring bonds.
>
> or:
> Open Babel Warning  in ParseRingBond
> Number not parsed correctly as a ring bond
>
> If I use the library.smi file instead of .fs, I get only 19 molecules
> (-at0.0), and no error message.
>
> If I convert the library smiles file into sdf, and depending on the
> compound in -Smol.smi, sometimes all molecules are converted, as I
> expect with -at0.0, sometimes only a few are converted, sometimes only
> one with e.g. this output:
>
> ZINC00089110   183 bits set
> 00410030 01248d00 a0084102 40100e04 40050001 00200809
> 10000480 102c1402 03484900 821a4081 40350020 40011a80
> 540a8518 202d0000 00000420 00000104 24484000 02000b00
> 400920c0 02000101 000a0500 38040694 40808c00 0448829e
> 00200010 20203620 44404010 88040150 008000e2 c11e0003
> ac824010 c3010680
> 1 molecule converted
>
>
> I'm trying to get a sample smi file to send as example, but until now
> I'm not able to reproduce every case I have written above.
>
> Did you already encounter such behavior, or is there a known bug I'm
> not aware of?


Your problem with missing molecules when using fastsearch with -at0.0 
seems to be caused by two bugs: 1) comparing Tanimoto coeffs with > 
rather than >= and 2) for files which do not have a new line at the end, 
the conversion stopped because of an eof when the last molecule was 
read. In fastsearch, unusually, the molecules are read non-sequentially. 
I'll commit the changes soon. Thanks for finding these bugs.

Incidentally, the -S option is deprecated; use -s instead, which can 
take either SMARTS or a file name containing molecule(s), and is more 
versatile.

Chris


------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Re: [Open Babel] Similarity search / Open Babel Warning in ParseSmiles

Reply via email to