Hi, I'm doing regressions for my chemfp package.
I have OpenBabel 2.2.3, 2.3.0 and today's "2.3.90" compiled from SVN, each compiled against various versions of Python. What's below is with Python 2.6. It looks like there's a change to the FP4 fingerprints, but I can't figure out why. Here's the reproducible ---> Version 2.2.3 and 2.3.0 <--- bash-3.2$ /Users/dalke/envs/py26-ob230/bin/python2.6 Python 2.6.1 (r261:67515, Jun 24 2010, 21:47:49) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import openbabel >>> openbabel.OBReleaseVersion() '2.3.0' >>> >>> import pybel >>> reader = pybel.readfile("sdf","tests/pubchem.sdf") >>> mol=reader.next() >>> mol.write("smi") 'Clc1c(/C=C/C(=O)NNC(=O)Cn2nc(cc2C)C)c(F)ccc1\t9425004\n' >>> mol.calcfp("FP4").bits [1, 5, 88, 137, 171, 172, 180, 181, 184, 274, 275, 287, 295, 300, 301, 302, 303] ---> Version 2.3.90 (built today from SVN) <--- bash-3.2$ /Users/dalke/envs/py26-ob23svn1/bin/python2.6 Python 2.6.1 (r261:67515, Jun 24 2010, 21:47:49) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import openbabel >>> openbabel.OBReleaseVersion() '2.3.90' >>> import pybel >>> reader = pybel.readfile("sdf","tests/pubchem.sdf") >>> mol = reader.next() >>> mol.write("smi") 'Clc1c(/C=C/C(=O)NNC(=O)Cn2nc(cc2C)C)c(F)ccc1\t9425004\n' >>> mol.calcfp("FP4").bits [1, 5, 88, 137, 171, 172, 180, 181, 184, 274, 275, 287, 289, 290, 295, 300, 301, 302, 303] You can see that the subversion build has two new bits; 289 and 290. The SMARTS_InteLigand.txt file is unchanged between the two releases. Indeed, there's been no SVN change for many years. The canonical SMILES are identical, so it's unlikely to be an aromaticity perception issue. I believe pybel's bit start counting from 1 since if I extract the SMARTS definitions from the SMARTS file I see that the first one (which I've labeled '1') is: 1 Primary_carbon: [CX4H3][#6] This means that 289 and 290 deal with the "D" pattern and deal with '/' and '\' bonds. 288 Conjugated_tripple_bond: *#*[*]=,#,:[*] 289 Cis_double_bond: */[D2]=[D2]\* 290 Trans_double_bond: */[D2]=[D2]/* 291 Mixed_anhydrides: [$(*=O),$([#16,#14,#5]),$([#7]([#6]=[OX1]))][#8X2][$(*=O),$([#16,#14,#5]),$([#7]([#6]=[OX1]))] Has the SMARTS pattern matcher for either of those changed? (I'm betting stereo.) Andrew da...@dalkescientific.com Here's the original structure file =================================== 9425004 -OEChem-01150805002D 40 41 0 0 0 0 0 0 0999 V2000 2.0000 -3.0580 0.0000 Cl 0 0 0 0 0 0 0 0 0 0 0 0 5.4641 -3.0580 0.0000 F 0 0 0 0 0 0 0 0 0 0 0 0 7.1962 0.9420 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 3.7321 -0.0580 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 7.1962 2.9420 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 7.3007 3.9365 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 5.4641 0.9420 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 5.4641 -0.0580 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 8.1097 2.5353 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 6.3301 2.4420 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 8.7788 3.2784 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 8.2788 4.1444 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 8.3176 1.5571 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 6.3301 1.4420 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 8.6856 5.0580 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 3.7321 -3.0580 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 3.7321 -2.0580 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 4.5981 -0.5580 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.8660 -3.5580 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 4.5981 -3.5580 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 4.5981 -1.5580 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.8660 -4.5580 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 4.5981 -4.5580 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 3.7321 -5.0580 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 6.1181 3.0246 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 5.7195 2.3343 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 9.3954 3.2136 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 7.7112 1.4282 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 8.4465 0.9507 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 8.9241 1.6860 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 8.1192 5.3102 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 8.9377 5.6244 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 9.2520 4.8058 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 4.9272 1.2520 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 6.0010 -0.3680 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 3.1951 -1.7480 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 5.1350 -1.8680 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 2.3291 -4.8680 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 5.1350 -4.8680 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 3.7321 -5.6780 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 1 19 1 0 0 0 0 2 20 1 0 0 0 0 3 14 2 0 0 0 0 4 18 2 0 0 0 0 5 6 1 0 0 0 0 5 9 1 0 0 0 0 5 10 1 0 0 0 0 6 12 2 0 0 0 0 7 8 1 0 0 0 0 7 14 1 0 0 0 0 7 34 1 0 0 0 0 8 18 1 0 0 0 0 8 35 1 0 0 0 0 9 11 2 0 0 0 0 9 13 1 0 0 0 0 10 14 1 0 0 0 0 10 25 1 0 0 0 0 10 26 1 0 0 0 0 11 12 1 0 0 0 0 11 27 1 0 0 0 0 12 15 1 0 0 0 0 13 28 1 0 0 0 0 13 29 1 0 0 0 0 13 30 1 0 0 0 0 15 31 1 0 0 0 0 15 32 1 0 0 0 0 15 33 1 0 0 0 0 16 17 1 0 0 0 0 16 19 1 0 0 0 0 16 20 2 0 0 0 0 17 21 2 0 0 0 0 17 36 1 0 0 0 0 18 21 1 0 0 0 0 19 22 2 0 0 0 0 20 23 1 0 0 0 0 21 37 1 0 0 0 0 22 24 1 0 0 0 0 22 38 1 0 0 0 0 23 24 2 0 0 0 0 23 39 1 0 0 0 0 24 40 1 0 0 0 0 M END > <PUBCHEM_COMPOUND_CID> 9425004 $$$$ =================================== ------------------------------------------------------------------------------ AppSumo Presents a FREE Video for the SourceForge Community by Eric Ries, the creator of the Lean Startup Methodology on "Lean Startup Secrets Revealed." This video shows you how to validate your ideas, optimize your ideas and identify your business strategy. http://p.sf.net/sfu/appsumosfdev2dev _______________________________________________ OpenBabel-discuss mailing list OpenBabel-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-discuss