Hi all,

  I just updated from OB 2.4.1 to the most recent version from version control. 
(This is part of a migration to Python 3.7.)

I noticed that the MACCS key implementation changed for about 1% of the first 
27008 ChEMBL-24 structures, and the FP2 fingerprints changed for a bit more 
than 1% of the structures. Here's a reproducible for MACCS:

% cat CHEMBL23759.smi
O=C1CC(=O)[N+](CC2CC2)=C2SC=CN12 CHEMBL23759

[py36-all] [xebulon:~/tmp] dalke% obabel CHEMBL23759.smi -ofps -xfMACCS
#FPS1
#num_bits=166
#type=OpenBabel-MACCS/1
#software=OpenBabel/2.4.1
#source=CHEMBL23759.smi
#date=2019-01-07T09:35:01
000020000840010001b495891b63d043c9e12c6f1f      CHEMBL23759
1 molecule converted

[py37-2019-1] [xebulon:~/tmp] dalke% obabel CHEMBL23759.smi -ofps -xfMACCS
#FPS1
#num_bits=166
#type=OpenBabel-MACCS/1
#software=OpenBabel/2.4.90
#source=CHEMBL23759.smi
#date=2019-01-07T09:34:39
000020000850010000b495891f63d04389612c6f1d      CHEMBL23759
1 molecule converted

If you compare the two strings you'll see several differences (I picked one 
with many differences)


000020000840010001b495891b63d043c9e12c6f1f
000020000850010000b495891f63d04389612c6f1d
                 ^       ^      ^ ^      ^

The most common changes from the subset of ChEMBL I tested are:

  Fewer matches in the new code for:
    [#8]!:*:*   Onot%A%A
    c:n   C%N
    [!#1]!:*:*!:[!#1]   Anot%A%Anot%A
    a   Aromatic

  More matches for:
    [#6]=[#6]   C=C

  Different matches for:
    [#7]!:*:*   Nnot%A%A


The same structure (CHEMBL23759) also has a number of changes for the FP2 
fingerprint, and changes for the FP3 and FP4 fingerprints. I haven't analyzed 
how many structures have changed for the latter two.

I assume it's a side effect of a change to aromaticity perception, and my guess 
is it's due to the following commit:

commit 1991439efd920f27cd9755fe8abf5c18699d4a58
Merge: a06e271 d78062b
Author: Geoff Hutchison <geoff.hutchi...@gmail.com>
Date:   Mon Oct 2 16:40:08 2017 -0400

    Merge pull request #1638 from baoilleach/daylightarom

    Implement the Daylight aromaticity model as described by John Mayfield


Is my diagnosis correct?

Has there only been one such change between the 2.4.1 release and now?

Since the fingerprint output has changed, would someone update the version 
number in Open Babel's FPS output from "1" to something higher?

The "type" version should be updated when the fingerprint implementation 
changes. Chemfp currently has:

  OpenBabel-MACCS/1 -- for pre-2012 versions, before a bug-fix in the SMARTS 
definitions
  OpenBabel-MACCS/2 -- for OB 2.4.1 

and /1 for the FP2, FP3, and FP4 types.

The version information helps identify possible incompatibility problems.

I am about to add the following types to chemfp, for the tentative reason 
"support the Daylight aromaticity model added in October 2017":

  OpenBabel-MACCS/2 to OpenBabel-MACCS/3
  OpenBabel-FP2/1 to OpenBabel-FP2/2
  OpenBabel-FP3/1 to OpenBabel-FP3/2
  OpenBabel-FP4/1 to OpenBabel-FP4/2

I would appreciate it if Open Babel produced the same version string as chemfp.

The relevant code is in src/formats/fpsformat.cpp line 130:

        << "#type=OpenBabel-" << _pFP->GetID() << "/1" << '\n'

That's a hard-coded version number for all fingerprint types.

I don't think the OB registry system supports versioning of the entire 
fingerprinting process, which makes sense from the plugin view because the 
plugin only knows about the format part, and not the fingerprint generation 
code. I don't know how the code might change to handle that information in the 
future.

(Chemfp internally has a similar problem. Even there I'm not sure how I'll 
handle it.)

The easy fix for now is likely to replace the "/1" with a "/3".

If the Open Babel developers decide to make that change then use 
"OpenBabel-FP2/3", etc. instead of "/2".

That means there wouldn't be an "OpenBabel-FP2/2", FP3/2, or FP4/2, but I think 
that's okay.

Best regards,

                                Andrew
                                da...@dalkescientific.com




_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Reply via email to