On 10/04/2013 16:55, Pascal Muller wrote: > Dear all, > > I would like to find similar molecules within library and compute > Tanimoto coefficient. > > I'm right by assuming that, using -at0.0, I should retrieve all molecules? > > obabel library.fs -Smol.smi -ofpt -at0.0 > (version 2.3.2) > > But I get only 1098 compounds out of a total of 1579, along with 262 > warning messages like: > > Open Babel Warning in ParseSmiles > Invalid SMILES string: 1 unmatched ring bonds. > > or: > Open Babel Warning in ParseRingBond > Number not parsed correctly as a ring bond > > If I use the library.smi file instead of .fs, I get only 19 molecules > (-at0.0), and no error message. > > If I convert the library smiles file into sdf, and depending on the > compound in -Smol.smi, sometimes all molecules are converted, as I > expect with -at0.0, sometimes only a few are converted, sometimes only > one with e.g. this output: > > ZINC00089110 183 bits set > 00410030 01248d00 a0084102 40100e04 40050001 00200809 > 10000480 102c1402 03484900 821a4081 40350020 40011a80 > 540a8518 202d0000 00000420 00000104 24484000 02000b00 > 400920c0 02000101 000a0500 38040694 40808c00 0448829e > 00200010 20203620 44404010 88040150 008000e2 c11e0003 > ac824010 c3010680 > 1 molecule converted > > > I'm trying to get a sample smi file to send as example, but until now > I'm not able to reproduce every case I have written above. > > Did you already encounter such behavior, or is there a known bug I'm > not aware of?
Your problem with missing molecules when using fastsearch with -at0.0 seems to be caused by two bugs: 1) comparing Tanimoto coeffs with > rather than >= and 2) for files which do not have a new line at the end, the conversion stopped because of an eof when the last molecule was read. In fastsearch, unusually, the molecules are read non-sequentially. I'll commit the changes soon. Thanks for finding these bugs. Incidentally, the -S option is deprecated; use -s instead, which can take either SMARTS or a file name containing molecule(s), and is more versatile. Chris ------------------------------------------------------------------------------ Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis & visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter _______________________________________________ OpenBabel-discuss mailing list OpenBabel-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-discuss