Hello,

I have noticed an issue with InChI generation, in a rather specific situation...

There are cases where the following generates different InChIs, whereas they ought to be identical....

new_mol = reduce(Chem.CombineMols, Chem.GetMolFrags(old_mol, asMols=True))

old_inchi = Chem.MolToInchi(old_mol)
new_inchi = Chem.MolToInchi(new_mol)

I've attached an SD file containing some molecules (actually different versions of the same compound) that exhibit the problem, and some code to demonstrate it. The actual application is from a custom-desalting procedure, but I hope this serves as an illustration. I can provide other examples if necessary.

I'm running the 2012_12_1 release, and see the same results on Mac OS X and Linux.

        Francis

--
Dr Francis L Atkinson

Chemogenomics Group
European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton
Cambridge UK

(01223) 494473



#! /Library/Frameworks/EPD64.framework/Versions/Current/bin/python

from __future__ import print_function

from rdkit import Chem

######

old_mols = [x for x in Chem.SDMolSupplier("404631.sdf")]

print("Checking that all input mols have same InChI: {}\n".format(len({Chem.MolToInchi(x) for x in old_mols}) == 1))

print("-" * 100)

for n, old_mol in enumerate(old_mols):

    print("Starting mol no. {}...".format(n))

    # Generate new mol by splitting into unconnected components and recombining...

    new_mol = reduce(Chem.CombineMols, Chem.GetMolFrags(old_mol, asMols=True))

    # Compare InChIs from old and new mols...

    old_inchi = Chem.MolToInchi(old_mol)
    new_inchi = Chem.MolToInchi(new_mol)

    differences = "".join(["v" if (old_inchi[i] != new_inchi[i]) else " " for i in range(0, min(len(old_inchi), len(new_inchi)))])

    print("{}\n{}\n{}".format(differences, old_inchi, new_inchi))

    # Passing the new mol though a molblock seems to fix the problem...

    new_inchi_2 = Chem.MolToInchi(Chem.MolFromMolBlock(Chem.MolToMolBlock(new_mol)))

    print("Checking whether old and new InChIs are the same (after passage thru molblock): {}".format(old_inchi == new_inchi_2))

    print("-" * 100)
  Marvin  02211109112D

 17 15  0  0  0  0            999 V2000
   -0.7607  -10.6459    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
   -0.0457  -10.2343    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.6692  -10.6459    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
    1.3843  -10.2343    0.0000 C   0  0  2  0  0  0  0  0  0  0  0  0
    2.0993  -10.6459    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
    2.8142  -10.2343    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
   -1.4740  -10.2352    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.6692  -11.4731    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    1.3843   -9.4072    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    2.0993  -11.4731    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    3.5317  -10.6451    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.2440  -10.2326    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    2.8132   -9.4072    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    6.1244   -9.8084    0.0000 Sb  0  0  0  0  0  0  0  0  0  0  0  0
    5.2971   -9.8084    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    6.5359   -9.0942    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    6.5359  -10.5227    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
  3  4  1  0  0  0  0
  4  9  1  1  0  0  0
  1  2  1  0  0  0  0
  5 10  1  1  0  0  0
  4  5  1  0  0  0  0
  6 11  1  0  0  0  0
 11 12  1  0  0  0  0
  5  6  1  0  0  0  0
  6 13  1  6  0  0  0
  2  3  1  0  0  0  0
  1  7  1  0  0  0  0
  3  8  1  1  0  0  0
 14 15  2  0  0  0  0
 14 16  2  0  0  0  0
 14 17  1  0  0  0  0
M  END
> <molregno>
404631

$$$$

  Marvin  01251111452D

 17 15  0  0  0  0            999 V2000
   10.5491   -5.7444    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   10.9657   -5.0258    0.0000 Sb  0  0  0  0  0  0  0  0  0  0  0  0
   10.5491   -4.3071    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   11.7949   -5.0258    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    2.5833   -5.3375    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.2978   -4.9250    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    4.0123   -5.3375    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    4.7267   -4.9250    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
    5.4412   -5.3375    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
    4.7267   -4.1000    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    5.4412   -6.1625    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    6.1557   -4.9250    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
    6.8702   -5.3375    0.0000 C   0  0  2  0  0  0  0  0  0  0  0  0
    6.1557   -4.1000    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    6.8702   -6.1625    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    7.5846   -4.9250    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    8.2991   -5.3375    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
  8  9  1  0  0  0  0
  8 10  1  6  0  0  0
  9 11  1  6  0  0  0
  5  6  1  0  0  0  0
  9 12  1  0  0  0  0
  1  2  1  0  0  0  0
 12 13  1  0  0  0  0
  6  7  1  0  0  0  0
 12 14  1  6  0  0  0
  2  3  2  0  0  0  0
 13 15  1  1  0  0  0
  7  8  1  0  0  0  0
 13 16  1  0  0  0  0
  2  4  2  0  0  0  0
 16 17  1  0  0  0  0
M  END
> <molregno>
404631

$$$$

  Marvin  01311110422D

 17 15  0  0  0  0            999 V2000
   19.4700  -21.6663    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   19.0581  -22.3811    0.0000 C   0  0  2  0  0  0  0  0  0  0  0  0
   18.2343  -22.3811    0.0000 C   0  0  2  0  0  0  0  0  0  0  0  0
   17.8223  -21.6663    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
   16.9985  -21.6663    0.0000 C   0  0  2  0  0  0  0  0  0  0  0  0
   16.5866  -22.3811    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   19.0581  -20.9556    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   19.4700  -23.0960    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   17.8223  -23.0960    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   18.2343  -20.9556    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   16.5866  -20.9556    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   15.7627  -22.3811    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
   15.3508  -23.0960    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   21.2500  -19.9698    0.0000 Sb  0  0  0  0  0  0  0  0  0  0  0  0
   20.4250  -19.9698    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   21.6625  -19.2540    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   21.6625  -20.6815    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  2  3  1  0  0  0  0
  3  4  1  0  0  0  0
  4  5  1  0  0  0  0
  5  6  1  0  0  0  0
  1  7  1  0  0  0  0
  2  8  1  1  0  0  0
  3  9  1  1  0  0  0
  4 10  1  1  0  0  0
  5 11  1  6  0  0  0
 12 13  1  0  0  0  0
  6 12  1  0  0  0  0
 14 15  2  0  0  0  0
 14 16  2  0  0  0  0
 14 17  1  0  0  0  0
M  END
> <molregno>
404631

$$$$

  Marvin  08191110492D

 17 15  0  0  0  0            999 V2000
    0.3417  -24.0125    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.0561  -23.6000    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    1.7706  -24.0125    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.4851  -23.6000    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
    3.1996  -24.0125    0.0000 C   0  0  2  0  0  0  0  0  0  0  0  0
    3.9140  -23.6000    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
    4.6285  -24.0125    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0
    5.3430  -23.6000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    6.0574  -24.0125    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    4.6285  -24.8375    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    3.9140  -22.7750    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    3.1996  -24.8375    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    2.4851  -22.7750    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    1.6583  -25.5945    0.0000 Sb  0  0  0  0  0  0  0  0  0  0  0  0
    1.2419  -24.8783    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    1.2419  -26.3066    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    2.4829  -25.5956    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
  6  7  1  0  0  0  0
  3  4  1  0  0  0  0
  7  8  1  0  0  0  0
  8  9  1  0  0  0  0
  4  5  1  0  0  0  0
  7 10  1  1  0  0  0
  2  3  1  0  0  0  0
  6 11  1  6  0  0  0
  5  6  1  0  0  0  0
  5 12  1  6  0  0  0
  1  2  1  0  0  0  0
  4 13  1  6  0  0  0
 14 16  2  0  0  0  0
 14 15  2  0  0  0  0
 14 17  1  0  0  0  0
M  END
> <molregno>
404631

$$$$
------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnnow-d2d
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to