Dear Samdani, Please send questions about Open Babel to the mailing list. I've cced this.
ECFP4 first appeared in Open Babel in v2.4.0. There were a number of problems which were only fixed since 2.4.1 was released. So that is the origin of most of the differences you will see if you don't use the development version. Apart from that, different implementations will use a different hash function and also may set different numbers of bits. This will affect the hash collisions, but should only have a small affect on 1024-bit fingerprints. Regards, - Noel On Sat, 29 Sep 2018 at 11:37, Samdani A <samdani1...@gmail.com> wrote: > Dear sir, > > I had read your article on "*Comparing structural fingerprints using a > literature-based similarity benchmark*" from Journal of Cheminformatics. > I have few questions regarding the fingerprint search using ECFP4. > > 1. I used OpenBabel for performing fingerprint search using ECFP4 with the > 2.4.0 version. I got the results as shown below. > babel query.smi test.smi -ofpt -xfECFP4 > >CHEMBL2332097 > >CHEMBL376321 Tanimoto from CHEMBL2332097 = -1 > >CHEMBL2333882 Tanimoto from CHEMBL2332097 = -1 > >CHEMBL1922047 Tanimoto from CHEMBL2332097 = -1 > >CHEMBL1079921 Tanimoto from CHEMBL2332097 = -1 > >CHEMBL2332097 Tanimoto from CHEMBL2332097 = 0.595116 > > why the same ligand shows a tanimoto value of 0.595116 and why the ligands > with no similarity shows -1 value? I would also like to know what is the > deafult bit-vector length for ECFP4 in obabel? > > 2. I downloaded the scripts used for benchmarking from the github link > provided in the manuscript and from the fingerprint_lib.py, the fingerprint > dict for ECFP4 with 1024bit I used for tanimoto calculation the result came > as follows, > > CHEMBL2332097 vs CHEMBL2332097 > --------------------------------------------------- > from rdkit import Chem > from rdkit.Chem import AllChem > from rdkit import DataStructs > FILE1=Chem.MolFromSmiles('COc1cccc(CN(C)C(=O)Nc2ccc(cc2)c3cn[nH]c3)c1') > FILE2=Chem.MolFromSmiles('COc1cccc(CN(C)C(=O)Nc2ccc(cc2)c3cn[nH]c3)c1') > F1bit=AllChem.GetMorganFingerprintAsBitVect(FILE1,2,nBits=1024) > F2bit=AllChem.GetMorganFingerprintAsBitVect(FILE2,2,nBits=1024) > print(DataStructs.FingerprintSimilarity(F1bit,F2bit)) > > 1.0 > > CHEMBL2332097 vs CHEMBL376321 > ------------------------------------------------ > > from rdkit import Chem > from rdkit.Chem import AllChem > from rdkit import DataStructs > FILE1=Chem.MolFromSmiles('COc1cccc(CN(C)C(=O)Nc2ccc(cc2)c3cn[nH]c3)c1') > > FILE2=Chem.MolFromSmiles('CCn1c(nc2cnc(Oc3cccc(NC(=O)c4ccc(CN5CCOCC5)cc4)c3)cc12)c6nonc6N') > F1bit=AllChem.GetMorganFingerprintAsBitVect(FILE1,2,nBits=1024) > F2bit=AllChem.GetMorganFingerprintAsBitVect(FILE2,2,nBits=1024) > print(DataStructs.FingerprintSimilarity(F1bit,F2bit)) > > 0.188679245283 > > Why the tanimoto value is different between obabel and rdkit? > > Kindly clarify my doubt. > > Regards > Samdani >
_______________________________________________ OpenBabel-discuss mailing list OpenBabel-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-discuss