Hi,

I am trying to get all substructure matches with OBSmartsPattern but
this doesn't seem to work as expected when explicit hydrogens are
involved.

The attached Python script calculates a MCS with the help of RDKit's
implementation of fmcs.  The MCS is created as SMARTS pattern (script
uses pre-computed pattern if RDKit not available) which I want to use to
find all possible matches.  I run fmcs in topological mode i.e. without
any atom or bond type information.

smi1 = 'n1c(N)c(N)ccc1'
smi2 = 'n1c(Cl)c(N)ccc1'

In the case where hydrogens are included (MCS is all atoms in smi2) I
would expect four matches with the first molecule but get only two.  The
"Cl" in the MCS can match either Ns of smi1 (I only get this without
hydrogens) and the two hydrogens added to the amino group can "flip" and
thus give two additional matches (I only get this when hydrogens added
for one Cl-to-N match).

I guess I misunderstand how this is supposed to work.  RDKit's
GetSubstructMatches() gives me the same result.


Many thanks,
Hannes.
try:
  import rdkit.Chem
  import rdkit.Chem.MCS
  have_rdkit = True
except:
  have_rdkit = False

import openbabel as ob


addH = True

smi1 = 'n1c(N)c(N)ccc1'
smi2 = 'n1c(Cl)c(N)ccc1'

if have_rdkit:
  mol1 = rdkit.Chem.MolFromSmiles(smi1)
  mol2 = rdkit.Chem.MolFromSmiles(smi2)

  if addH:
      mol1 = rdkit.Chem.AddHs(mol1)
      mol2 = rdkit.Chem.AddHs(mol2)

  mcs = rdkit.Chem.MCS.FindMCS( (mol1, mol2),
                               maximize = 'atoms',
                               atomCompare = 'any',
                               bondCompare = 'any',
                               completeRingsOnly = True)

  p = rdkit.Chem.MolFromSmarts(mcs.smarts)
  smarts = mcs.smarts

  print mcs.smarts, '\n'

  print '=== RDKit ==='

  for m1 in mol1.GetSubstructMatches(p, uniquify = False):
    print [i+1 for i in m1]

  print

  for m2 in mol2.GetSubstructMatches(p, uniquify = False):
    print [i+1 for i in m2]
else:
  if addH:
    smarts = '[*]~@1~@[*](~@[*](~@[*](~!@[*])~@[*](~!@[*])~@[*]~@1~!@[*])~!@[*](~!@[*])~!@[*])~!@[*]'
  else:
    smarts = '[*]~@1~@[*](~@[*](~@[*]~@[*]~@[*]~@1)~!@[*])~!@[*]'


print '\n=== OpenBabel ==='

conv = ob.OBConversion()
conv.SetInFormat('smiles')

obmol1 = ob.OBMol()
obmol2 = ob.OBMol()

conv.ReadString(obmol1, smi1)
conv.ReadString(obmol2, smi2)

if addH:
  obmol1.AddHydrogens(False, False, 0.0)
  obmol2.AddHydrogens(False, False, 0.0)

pat1 = ob.OBSmartsPattern()
pat2 = ob.OBSmartsPattern()

pat1.Init(smarts)
pat2.Init(smarts)

pat1.Match(obmol1)
pat2.Match(obmol2)

for m1 in pat1.GetMapList():
    print m1

print

for m2 in pat2.GetMapList():
    print m2

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Reply via email to