Thanks Greg.

I’ll try to isolate the issue a bit more, in the meantime.
If I can get a one-liner of a very simple custom-atom-typed smarts being 
imported as RWMol, I’ll update the issue.

Cheers,

Luca Fenu – Senior Chemoinformatician
Computational Chemist, CADD Group, Chemistry
Eli Lilly and Company Limited
Erl Wood Manor, Sunninghill Road, Windlesham, Surrey, GU20 6PH
012.764.83354 (office) | 07858.701.646 (mobile)
[email protected]<mailto:[email protected]> | 
www.lilly.co.uk<http://www.lilly.co.uk/>

[cid:[email protected]]


From: Greg Landrum [mailto:[email protected]]
Sent: 04 July 2016 06:43
To: Luca Fenu - Network
Cc: [email protected]
Subject: Re: [Rdkit-discuss] Again on getting rid of ring-atoms in MCS 
smarts/smiles

Hi Luca,

This is not a simple one and may take a bit for me to diagnose. I will try to 
do that in the next couple of days, but I'm at the Sheffield conference so it 
may take a while longer.

-greg


On Fri, Jul 1, 2016 at 6:38 PM, Luca Fenu - Network 
<[email protected]<mailto:[email protected]>> wrote:
I kept working on the issue I posted about before, and here’s the issue I 
believe may be at the bottom of what I am seeing – code snippets should be 
executable:
# the two molecules whose MCS I’m computing:
[cid:[email protected]]
from rdkit import Chem
def ring_labels (a): # similar to what seen in 
http://rdkit.org/docs/Cookbook.html#using-custom-mcs-atom-types
# using 300 here rather than 100 as Greg suggested in case some of the users 
thinks of using a heavy atom
    return 300*a.GetOwningMol().GetRingInfo().NumAtomRings(a.GetIdx()) + 
a.GetAtomicNum()
# I cheat here, and define the molecule directly from an already ‘custom-typed’ 
smarts I got out of the custom atom-typing + MCS procedure:
bigMol = 
Chem.MolFromSmiles('[8OH][306c]1[306cH][306cH][306c]([307N]2[306CH2][306CH2][307NH][306CH2][306CH2]2)[306cH][306c]1-[306c]1[306cH][306cH][306cH][306cH][306cH]1')
# define a smartsString (coming out of an MCS operation:
smartsStringA = 
'[8*]-[306*]1:[306*]:[306*]:[306*](:[306*]:[306*]:1-[306*])-[307*]1-[306*]-[306*]-[307*]-[306*]-[306*]-1'
# now obtaining back the atom indices being matched by the smarts pattern above
bigMol.GetSubstructMatch( Chem.MolFromSmarts(smartsStringA), useChirality=True)
# output # (0, 1, 2, 3, 4, 11, 12, 13, 5, 6, 7, 8, 9, 10)

# now let’s get rid of a single atom in the substructure:
# smartsStringB was generated converting to smarts an RWMol generated from 
smarstStringA
# From which the atom was excised on the basis of:
# - the custom atomtype value being above 300 and
# - the atom.IsInRing() returning False.

smartsStringB = 
'[*&8*]-,:[*&306*]1:[*&306*]:[*&306*]:[*&306*](-,:[*&307*]2:[*&306*]:[*&306*]:[*&307*]:[*&306*]:[*&306*]:2):[*&306*]:[*&306*]:1'
# smartsStringB has a few weird characters in its smarts, like ‘;’, which don’t 
seem to cause rdkit parsing to fail.

# However, they stop it from re-matching the smarts to the molecule from which 
it was derived.
bigMol.GetSubstructMatch( Chem.MolFromSmarts(smartsStringB), useChirality=True)
# output # ()

# both smartsStringA and smartsStringB can be parsed successfully by marvin and 
rdkit alike:
[cid:[email protected]]
I can manually edit the smiles or copy a portion of it from marvin to get the 
GetSubStructMatch() running as I expect, but what is the correct way to achieve 
this programmatically?

Thanks to all,

Luca Fenu – Senior Chemoinformatician
Computational Chemist, CADD Group, Chemistry
Eli Lilly and Company Limited
Erl Wood Manor, Sunninghill Road, Windlesham, Surrey, GU20 6PH
012.764.83354 (office) | 07858.701.646 (mobile)
[email protected]<mailto:[email protected]> | 
www.lilly.co.uk<http://www.lilly.co.uk/>

[cid:[email protected]]

From: Luca Fenu - Network 
[mailto:[email protected]<mailto:[email protected]>]
Sent: 29 June 2016 13:57
To: 
[email protected]<mailto:[email protected]>
Subject: [Rdkit-discuss] Again on getting rid of ring-atoms in MCS smarts/smiles

Hello again,

I’m still working on my MCS code, and following Greg’s suggestion, have tried 
to use custom atom types to get rid of atoms which are the entry point to a 
ring whose other members are not contained in the MCS.
Example: two molecules being overlaid, the MCS is shown below. I’d like to get 
rid of the N bracketed.
[cid:[email protected]]

Here’s the code I put together – much of it inspired by our previous discussion 
and the custom-atom-types how-to page. It seems to work in some cases, but 
doesn’t in others.

# begin of ring_aware_mcs code
# see http://rdkit.org/docs/Cookbook.html#using-custom-mcs-atom-types and
# https://sourceforge.net/p/rdkit/mailman/message/35170984/
# for details on this block of code
def ring_labels(a):
   # using 300 here rather than 100 as Greg suggested in case some of the users 
thinks of using a heavy atom anywhere.
    return 300*a.GetOwningMol().GetRingInfo().NumAtomRings(a.GetIdx()) + 
a.GetAtomicNum()

def getMCSSmiles(mol, labelledMol, mcs, loggero):

    mcsp = Chem.MolFromSmarts(mcs.smartsString)

    # Trying to find a way to delete every atom belonging to a ring (that is, 
with custom atomtype above 100)
    rmcsp = get_rid_of_isolated_ring_atoms(mcs, loggero)
    # if it's not directly connected to another atom in the same atomic custom 
atomtype spectrum
    match = labelledMol.GetSubstructMatch(rmcsp, useChirality=True)
    loggero.debug("got tentative match %s"%str(match))
    if match == (): # if previous step didn't work, then we go back to the old 
simpler way (which may still cause omega to fail
        loggero.warning("No match returned from rmcsp, trying using raw 
smartsString (mcsp - may contain isolated ring-atoms)")
        loggero.warning("mcsp(%s - %d atoms) versus rmcsp(%s - %d 
atoms)"%(Chem.MolToSmarts(mcsp), mcsp.GetNumAtoms(), Chem.MolToSmarts(rmcsp), 
rmcsp.GetNumAtoms()))
        match = labelledMol.GetSubstructMatch(mcsp, useChirality=True)
    loggero.debug("got match %s"%str(match))
    if mcs.smartsString is not '' and match is not ():
        return Chem.MolFragmentToSmiles(
            mol, atomsToUse=match,
            isomericSmiles=True,
            canonical=False
        )
    else:
        return ''

def get_rid_of_isolated_ring_atoms(mcs, loggero):
    '''
    this function transforms the matching smarts into an editable molecule 
(RWmol), and iterates through its atoms
    checking, if they are cyclic (isotope number > 200) if any of their 
neighbours share the ring
    (that is, have a similar isotope number) - if that is not the case, then 
the atom under exam is excised from the molecule
    and finally a new smiles is returned. There may be a more efficient way to 
do this...
    :param mcs:
    :param loggero:
    :return:
    '''
    mcsp = Chem.rdchem.RWMol(Chem.MolFromSmiles(mcs.smartsString))
    loggero.debug("***********\nmcsp contains %d atoms"%(mcsp.GetNumAtoms()))
    loggero.debug ("%s"%Chem.MolToSmiles(mcsp, isomericSmiles=True))
    for at in mcsp.GetAtoms():
        alone_in_ring = False
        ait = at.GetIsotope() # can't get back the number. why???
        loggero.debug("AtomType (%s) = %d"%(str(at), ait))
        if ait > 200: # that is, if the atom is in a ring
            alone_in_ring = True
            loggero.debug("alone_in_ring = %s"%(str(alone_in_ring)))
            for nb in at.GetNeighbors():
                nit=nb.GetIsotope()
                loggero.debug("\tn: %s, nit %d "%(str(nb), nit))
                if abs(nit - ait) < 200:
                    alone_in_ring = False
                    loggero.debug("\t * nit %d - alone_in_ring = %s"%(nit, 
str(alone_in_ring)))
                    break
        if alone_in_ring:
            mcsp.RemoveAtom(at.GetIdx())
            loggero.debug("mcsp now contains %d atoms"%(mcsp.GetNumAtoms()))
            loggero.debug("%s"%Chem.MolToSmiles(mcsp, isomericSmiles=True))
    loggero.debug("got rid of peripheral ring-atoms: %s"%Chem.MolToSmiles(mcsp, 
isomericSmiles=True))
    return Chem.MolFromSmarts(Chem.MolToSmiles(mcsp, isomericSmiles=True))

def ring_aware_mcs(molA, molB, loggero):
    nms = [Chem.Mol(x) for x in (molA, molB)]
    for nm in nms:
        old_smi = Chem.MolToSmiles(nm, isomericSmiles=True)
        for at in nm.GetAtoms():
            at.SetIsotope(ring_labels(at))
        new_smi = Chem.MolToSmiles(nm, isomericSmiles=True)
        loggero.debug( "OLD: %s\nNEW %s"%(old_smi, new_smi))
    mcs = rdFMCS.FindMCS(
        nms,
        maximizeBonds=True,
        matchValences=True,
        matchChiralTag=True,
        ringMatchesRingOnly=True, # ring should match other rings
        completeRingsOnly=True, # a ring atom is part of an MCS only if the 
whole ring is
        atomCompare=rdFMCS.AtomCompare.CompareIsotopes
    )
    mcs.smilesString = getMCSSmiles(molA, nms[0], mcs, loggero)
    return mcs
# end of ring_aware_mcs code

and a unittest snippet (the first two cases behave as expected, the second one 
doesn’t):

    def test_ring_aware_MCS(self):
        molA = Chem.MolFromSmiles("SC1CCN(C1)S(=O)(=O)c1ccccc1")
        molB = Chem.MolFromSmiles("SC1CCN(P1)S(=O)(=O)c1ccccc1")
        MCS = ring_aware_mcs(molA, molB, self.testlog)
        print "Out of ring_aware_mcs, MCS.smilesString = %s" % MCS.smilesString
        self.assertEqual(MCS.smilesString , "O=S(=O)c1ccccc1") # this works

        molA = Chem.MolFromSmiles("C(N1CCCC1)C1=CC=CC=C1")
        molB = Chem.MolFromSmiles("C(N1CCOC1)C1=CC=CC=C1")
        MCS = ring_aware_mcs(molA, molB, self.testlog)
        print "Out of ring_aware_mcs, MCS.smilesString = %s" % MCS.smilesString
        self.assertEqual(MCS.smilesString , "Cc1ccccc1") # this works too

        molC = Chem.MolFromSmiles("O=C1NC(CN2CCCC2)C(=O)N1")
        molD = Chem.MolFromSmiles("O=C1NC(CN2CCOC2)C(=O)N1")
        MCS = ring_aware_mcs(molC, molD, self.testlog)
        print "Out of ring_aware_mcs, MCS.smilesString = %s" % MCS.smilesString
        self.assertEqual(MCS.smilesString , "CC1NC(=O)NC1=O") # doesn’t work as 
expected - O=C1NC(CN)C(=O)N1 comes out instead

The elimination procedure I put in place seem to run as expected, but when the 
‘reduced’ smarts is applied to the labelled molecule for substructure matching, 
it doesn’t return a list of atom indices. If the red branch of the code (which 
uses get_rid_of_isolated_ring_atoms()) doesn’t work, the blue one takes over 
(hope the colors show) – but the MCS I obtain that way causes issues 
downstream, when I try to pass it on to third party conformer generation 
software to keep that portion of the molecule fixed.
Am I hitting a bug, or am I doing something wrong while juggling smarts, smiles 
and molecule objects?
Thanks in advance,

Luca Fenu – Computational Chemist
Computational Chemist, CADD Group, Chemistry
Eli Lilly and Company Limited
Erl Wood Manor, Sunninghill Road, Windlesham, Surrey, GU20 6PH
012.764.83354 (office) | 07858.701.646 (mobile)
[email protected]<mailto:[email protected]> | 
www.lilly.co.uk<http://www.lilly.co.uk>

[cid:[email protected]]


------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
_______________________________________________
Rdkit-discuss mailing list
[email protected]<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to