Hi all, I just updated from OB 2.4.1 to the most recent version from version control. (This is part of a migration to Python 3.7.)
I noticed that the MACCS key implementation changed for about 1% of the first 27008 ChEMBL-24 structures, and the FP2 fingerprints changed for a bit more than 1% of the structures. Here's a reproducible for MACCS: % cat CHEMBL23759.smi O=C1CC(=O)[N+](CC2CC2)=C2SC=CN12 CHEMBL23759 [py36-all] [xebulon:~/tmp] dalke% obabel CHEMBL23759.smi -ofps -xfMACCS #FPS1 #num_bits=166 #type=OpenBabel-MACCS/1 #software=OpenBabel/2.4.1 #source=CHEMBL23759.smi #date=2019-01-07T09:35:01 000020000840010001b495891b63d043c9e12c6f1f CHEMBL23759 1 molecule converted [py37-2019-1] [xebulon:~/tmp] dalke% obabel CHEMBL23759.smi -ofps -xfMACCS #FPS1 #num_bits=166 #type=OpenBabel-MACCS/1 #software=OpenBabel/2.4.90 #source=CHEMBL23759.smi #date=2019-01-07T09:34:39 000020000850010000b495891f63d04389612c6f1d CHEMBL23759 1 molecule converted If you compare the two strings you'll see several differences (I picked one with many differences) 000020000840010001b495891b63d043c9e12c6f1f 000020000850010000b495891f63d04389612c6f1d ^ ^ ^ ^ ^ The most common changes from the subset of ChEMBL I tested are: Fewer matches in the new code for: [#8]!:*:* Onot%A%A c:n C%N [!#1]!:*:*!:[!#1] Anot%A%Anot%A a Aromatic More matches for: [#6]=[#6] C=C Different matches for: [#7]!:*:* Nnot%A%A The same structure (CHEMBL23759) also has a number of changes for the FP2 fingerprint, and changes for the FP3 and FP4 fingerprints. I haven't analyzed how many structures have changed for the latter two. I assume it's a side effect of a change to aromaticity perception, and my guess is it's due to the following commit: commit 1991439efd920f27cd9755fe8abf5c18699d4a58 Merge: a06e271 d78062b Author: Geoff Hutchison <geoff.hutchi...@gmail.com> Date: Mon Oct 2 16:40:08 2017 -0400 Merge pull request #1638 from baoilleach/daylightarom Implement the Daylight aromaticity model as described by John Mayfield Is my diagnosis correct? Has there only been one such change between the 2.4.1 release and now? Since the fingerprint output has changed, would someone update the version number in Open Babel's FPS output from "1" to something higher? The "type" version should be updated when the fingerprint implementation changes. Chemfp currently has: OpenBabel-MACCS/1 -- for pre-2012 versions, before a bug-fix in the SMARTS definitions OpenBabel-MACCS/2 -- for OB 2.4.1 and /1 for the FP2, FP3, and FP4 types. The version information helps identify possible incompatibility problems. I am about to add the following types to chemfp, for the tentative reason "support the Daylight aromaticity model added in October 2017": OpenBabel-MACCS/2 to OpenBabel-MACCS/3 OpenBabel-FP2/1 to OpenBabel-FP2/2 OpenBabel-FP3/1 to OpenBabel-FP3/2 OpenBabel-FP4/1 to OpenBabel-FP4/2 I would appreciate it if Open Babel produced the same version string as chemfp. The relevant code is in src/formats/fpsformat.cpp line 130: << "#type=OpenBabel-" << _pFP->GetID() << "/1" << '\n' That's a hard-coded version number for all fingerprint types. I don't think the OB registry system supports versioning of the entire fingerprinting process, which makes sense from the plugin view because the plugin only knows about the format part, and not the fingerprint generation code. I don't know how the code might change to handle that information in the future. (Chemfp internally has a similar problem. Even there I'm not sure how I'll handle it.) The easy fix for now is likely to replace the "/1" with a "/3". If the Open Babel developers decide to make that change then use "OpenBabel-FP2/3", etc. instead of "/2". That means there wouldn't be an "OpenBabel-FP2/2", FP3/2, or FP4/2, but I think that's okay. Best regards, Andrew da...@dalkescientific.com _______________________________________________ OpenBabel-discuss mailing list OpenBabel-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-discuss