Hi,

  I'm doing regressions for my chemfp package.

  I have OpenBabel 2.2.3, 2.3.0 and today's "2.3.90" compiled from SVN, each 
compiled against various versions of Python. What's below is with Python 2.6.

  It looks like there's a change to the FP4 fingerprints, but I can't figure 
out why.

Here's the reproducible


  ---> Version 2.2.3 and 2.3.0 <---

bash-3.2$ /Users/dalke/envs/py26-ob230/bin/python2.6
Python 2.6.1 (r261:67515, Jun 24 2010, 21:47:49) 
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import openbabel
>>> openbabel.OBReleaseVersion()
'2.3.0'
>>>
>>> import pybel
>>> reader = pybel.readfile("sdf","tests/pubchem.sdf")
>>> mol=reader.next()
>>> mol.write("smi")
'Clc1c(/C=C/C(=O)NNC(=O)Cn2nc(cc2C)C)c(F)ccc1\t9425004\n'
>>> mol.calcfp("FP4").bits
[1, 5, 88, 137, 171, 172, 180, 181, 184, 274, 275, 287, 295, 300, 301, 302, 303]

  ---> Version 2.3.90 (built today from SVN) <---

bash-3.2$ /Users/dalke/envs/py26-ob23svn1/bin/python2.6 
Python 2.6.1 (r261:67515, Jun 24 2010, 21:47:49) 
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import openbabel
>>> openbabel.OBReleaseVersion()
'2.3.90'
>>> import pybel
>>> reader = pybel.readfile("sdf","tests/pubchem.sdf")
>>> mol = reader.next()
>>> mol.write("smi")
'Clc1c(/C=C/C(=O)NNC(=O)Cn2nc(cc2C)C)c(F)ccc1\t9425004\n'
>>> mol.calcfp("FP4").bits
[1, 5, 88, 137, 171, 172, 180, 181, 184, 274, 275, 287, 289, 290, 295, 300, 
301, 302, 303]



You can see that the subversion build has two new bits; 289 and 290.

The SMARTS_InteLigand.txt file is unchanged between the two releases. Indeed, 
there's been no SVN change for many years.

The canonical SMILES are identical, so it's unlikely to be an aromaticity 
perception issue.


I believe pybel's bit start counting from 1 since if I extract the SMARTS 
definitions from the SMARTS file I see that the first one (which I've labeled 
'1') is:

1 Primary_carbon: [CX4H3][#6]

This means that 289 and 290 deal with the "D" pattern and deal with '/' and '\' 
bonds.

288 Conjugated_tripple_bond: *#*[*]=,#,:[*]
289 Cis_double_bond: */[D2]=[D2]\*
290 Trans_double_bond: */[D2]=[D2]/*
291 Mixed_anhydrides: 
[$(*=O),$([#16,#14,#5]),$([#7]([#6]=[OX1]))][#8X2][$(*=O),$([#16,#14,#5]),$([#7]([#6]=[OX1]))]

Has the SMARTS pattern matcher for either of those changed? (I'm betting 
stereo.)


                                Andrew
                                da...@dalkescientific.com

Here's the original structure file

===================================
9425004
  -OEChem-01150805002D

 40 41  0     0  0  0  0  0  0999 V2000
    2.0000   -3.0580    0.0000 Cl  0  0  0  0  0  0  0  0  0  0  0  0
    5.4641   -3.0580    0.0000 F   0  0  0  0  0  0  0  0  0  0  0  0
    7.1962    0.9420    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    3.7321   -0.0580    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    7.1962    2.9420    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    7.3007    3.9365    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    5.4641    0.9420    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    5.4641   -0.0580    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    8.1097    2.5353    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    6.3301    2.4420    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    8.7788    3.2784    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    8.2788    4.1444    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    8.3176    1.5571    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    6.3301    1.4420    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    8.6856    5.0580    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.7321   -3.0580    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.7321   -2.0580    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.5981   -0.5580    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.8660   -3.5580    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.5981   -3.5580    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.5981   -1.5580    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.8660   -4.5580    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.5981   -4.5580    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.7321   -5.0580    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    6.1181    3.0246    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    5.7195    2.3343    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    9.3954    3.2136    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    7.7112    1.4282    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    8.4465    0.9507    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    8.9241    1.6860    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    8.1192    5.3102    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    8.9377    5.6244    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    9.2520    4.8058    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    4.9272    1.2520    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    6.0010   -0.3680    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    3.1951   -1.7480    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    5.1350   -1.8680    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    2.3291   -4.8680    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    5.1350   -4.8680    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
    3.7321   -5.6780    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
  1 19  1  0  0  0  0
  2 20  1  0  0  0  0
  3 14  2  0  0  0  0
  4 18  2  0  0  0  0
  5  6  1  0  0  0  0
  5  9  1  0  0  0  0
  5 10  1  0  0  0  0
  6 12  2  0  0  0  0
  7  8  1  0  0  0  0
  7 14  1  0  0  0  0
  7 34  1  0  0  0  0
  8 18  1  0  0  0  0
  8 35  1  0  0  0  0
  9 11  2  0  0  0  0
  9 13  1  0  0  0  0
 10 14  1  0  0  0  0
 10 25  1  0  0  0  0
 10 26  1  0  0  0  0
 11 12  1  0  0  0  0
 11 27  1  0  0  0  0
 12 15  1  0  0  0  0
 13 28  1  0  0  0  0
 13 29  1  0  0  0  0
 13 30  1  0  0  0  0
 15 31  1  0  0  0  0
 15 32  1  0  0  0  0
 15 33  1  0  0  0  0
 16 17  1  0  0  0  0
 16 19  1  0  0  0  0
 16 20  2  0  0  0  0
 17 21  2  0  0  0  0
 17 36  1  0  0  0  0
 18 21  1  0  0  0  0
 19 22  2  0  0  0  0
 20 23  1  0  0  0  0
 21 37  1  0  0  0  0
 22 24  1  0  0  0  0
 22 38  1  0  0  0  0
 23 24  2  0  0  0  0
 23 39  1  0  0  0  0
 24 40  1  0  0  0  0
M  END
> <PUBCHEM_COMPOUND_CID>
9425004

$$$$
===================================
------------------------------------------------------------------------------
AppSumo Presents a FREE Video for the SourceForge Community by Eric 
Ries, the creator of the Lean Startup Methodology on "Lean Startup 
Secrets Revealed." This video shows you how to validate your ideas, 
optimize your ideas and identify your business strategy.
http://p.sf.net/sfu/appsumosfdev2dev
_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Reply via email to