"My initial answer, and I would love input on this, is that
three-coordinate N should always have stereochemistry removed."

Umm... even if it's a bridgehead?

-P.

On Thu, Aug 20, 2015 at 10:30 AM, Greg Landrum <[email protected]>
wrote:

> This isn't a simple one, so it may take a bit to get to an answer that's
> comprehensible.
>
> There are two things going on here in the RDKit:
> 1) Ring stereochemistry
> 2) stereochemistry about nitrogen centers
>
> Let's start with the second, because it's easier: RDKit does not generally
> "believe in" stereochemistry around three coordinate nitrogens. Here's a
> very simple example:
> In [45]: m3 = Chem.MolFromSmiles('Br[N@](F)Cl')
>
> In [46]: Chem.MolToSmiles(m3,isomericSmiles=True)
> Out[46]: 'FN(Cl)Br'
>
>
> The 3D equivalent of that:
> In [41]: m = Chem.MolFromSmiles('BrN(F)Cl')
>
> In [42]: AllChem.EmbedMolecule(m)
> Out[42]: 0
>
> In [43]: Chem.AssignAtomChiralTagsFromStructure(m)
>
> In [44]: Chem.MolToSmiles(m,isomericSmiles=True)
> Out[44]: 'FN(Cl)Br'
>
> Contrast this with what you get for a carbon:
>
> In [34]: m2 = Chem.MolFromSmiles('FC(Br)(Cl)I')
>
> In [35]: AllChem.EmbedMolecule(m2)
> Out[35]: 0
>
> In [36]: Chem.AssignAtomChiralTagsFromStructure(m2)
>
> In [37]: Chem.MolToSmiles(m2,isomericSmiles=True)
> Out[37]: 'F[C@](Cl)(Br)I'
>
>
> Back to the first: ring stereochemistry. By this I mean things like C[C@H
> ]1CC[C@@H](C)CC1 - molecules where the stereochemistry information is
> really about whether the substituents of the ring are cis or trans relative
> to the ring plane.
>
> The way the RDKit handles this is something of a hack: it doesn't identify
> those atoms as chiral centers, but it does preserve the chiral tags when
> generating a canonical SMILES:
>
> In [47]: m = Chem.MolFromSmiles('C[C@H]1CC[C@@H](C)CC1')
>
> In [48]: Chem.FindMolChiralCenters(m)
> Out[48]: []
>
> In [49]: Chem.MolToSmiles(m,isomericSmiles=True)
> Out[49]: 'C[C@H]1CC[C@@H](C)CC1'
>
> Curiously, to me at least, it does the same thing with nitrogens;
>
> In [52]: m2 = Chem.MolFromSmiles('C[N@@]1CC[C@@H](C)CC1')
>
> In [53]: Chem.MolToSmiles(m2,isomericSmiles=True)
> Out[53]: 'C[C@H]1CC[N@](C)CC1'
>
> Lest anyone think that this might make sense because being a ring makes
> inversion more difficult, that's not what is going on here. If I make the
> ring truly chiral, then the stereochemistry of the N is removed:
>
> In [54]: m3 = Chem.MolFromSmiles('C[N@@]1CO[C@@H](C)CC1')
>
> In [55]: Chem.MolToSmiles(m3,isomericSmiles=True)
> Out[55]: 'C[C@H]1CCN(C)CO1'
>
> I believe that this inconsistent behavior is a bug: either N should always
> have the input stereochemistry preserved (and that should be perceived from
> the 3D coordinates) or it should never have the input stereochemistry
> preserved. My initial answer, and I would love input on this, is that
> three-coordinate N should always have stereochemistry removed.
>
> -greg
>
>
>
> On Thu, Aug 20, 2015 at 2:22 PM, Rob Smith <[email protected]> wrote:
>
>> Hi Greg,
>>
>> I've attached the SDF that Corina generates. I'm not convinced it is a
>> problem, more an observation that I'm trying to understand.
>>
>> Looking at the results again today - it seems that from the Corina output
>> Indigo is interpreting the conformer (including whether the ethyl
>> substituent on the piperidine nitrogen is equatorial or axial) - and
>> outputting a canonical smiles string that has the conformer "encoded" in it
>> (using the chiral flags). Whereas RDKit is reading in the Corina output,
>> "discounting" whether the nitrogen is axial or equatorial (which due to
>> inversion I can understand) and interpreting it as having only two chiral
>> centers (which is correct).
>>
>> What is confusing me, is that when I supply RDKit with the canonical
>> smiles string from Indigo (which has the conformer "encoded" in it), and
>> then ask for the isomeric canonical smiles, it supplies the canonical
>> smiles with the conformer still "encoded" within it.
>>
>> For example, I read in the following canonical smiles string into
>> RDKit: CCN1CC[C@@H]([N@@H+]2CC[C@@H]2[C@H](C)C)CC1 (which was generated
>> by reading in one of the mols in the SD File into RDKit and output the
>> isomeric canonical smiles), running the FindMolChiralCenters on this
>> molecule, correctly reports the number of chiral centres to be 2 (6S, 9R),
>> and then asking it to output the canonical smiles string (with
>> isomericSmiles=True) gives CCN1CCC([N@@H+]2CC[C@@H]2C(C)C)CC1 (1).
>>
>> If I take the same mol file, read it into Indigo, and ask it to output
>> the canonical smiles string, I get: 
>> CC(C)[C@H]1CC[N@H+]1[C@@H]1CC[N@@](CC1)CC,
>> if I read this smiles string into RDKit and run FindMolCenters on it, I get
>> (3R, 6S) - which is fine, if I then out the canonical smiles (again with
>> isomericSmiles=True) I get CC[N@]1CC[C@@H]([N@@H+]2CC[C@@H]2C(C)C)CC1. I
>> expected this isomeric canonical smiles to be the same as (1), however
>> RDKit appears to conserve the conformer representation given to it from an
>> isomeric smiles string, but when reading a Mol file doesn't keep all
>> conformer information (axial or equatorial substituents on a nitrogen).
>>
>> Thanks to all for your quick (and quick witted) responses
>>
>> Rob
>>
>>
>> On Thu, Aug 20, 2015 at 3:46 AM, Greg Landrum <[email protected]>
>> wrote:
>>
>>> Hi Rob,
>>>
>>> The results below are quite strange. As John has already pointed out:
>>> there really shouldn't be chirality present on either the N+ or the C that
>>> has two methyls attached.
>>>
>>> I tried to reproduce the problem by running corina myself using the same
>>> command-line options you provided (from SMILES instead of SDF, but I don't
>>> think that should make a difference), but I get sensible results;
>>>
>>> In [5]: s = Chem.SDMolSupplier('sample.sdf')
>>>
>>> In [6]: for m in s:
>>>     Chem.AssignAtomChiralTagsFromStructure(m)
>>>     Chem.AssignStereochemistry(m,cleanIt=True,force=True)
>>>    ...:     print Chem.MolToSmiles(m,True)
>>>    ...:
>>> CCN1CCC([N@@H+]2CC[C@@H]2C(C)C)CC1
>>> CCN1CCC([N@@H+]2CC[C@@H]2C(C)C)CC1
>>> CCN1CCC([N@H+]2CC[C@@H]2C(C)C)CC1
>>> CCN1CCC([N@H+]2CC[C@@H]2C(C)C)CC1
>>> CCN1CCC([N@@H+]2CC[C@H]2C(C)C)CC1
>>> CCN1CCC([N@@H+]2CC[C@H]2C(C)C)CC1
>>> CCN1CCC([N@H+]2CC[C@H]2C(C)C)CC1
>>> CCN1CCC([N@H+]2CC[C@H]2C(C)C)CC1
>>>
>>> In [7]: s = Chem.SDMolSupplier('sample.sdf')
>>>
>>> In [8]: for m in s:
>>>     Chem.AssignAtomChiralTagsFromStructure(m)
>>>     print Chem.MolToSmiles(m,True)
>>>    ...:
>>> CCN1CCC([N@@H+]2CC[C@@H]2C(C)C)CC1
>>> CCN1CCC([N@@H+]2CC[C@@H]2C(C)C)CC1
>>> CCN1CCC([N@H+]2CC[C@@H]2C(C)C)CC1
>>> CCN1CCC([N@H+]2CC[C@@H]2C(C)C)CC1
>>> CCN1CCC([N@@H+]2CC[C@H]2C(C)C)CC1
>>> CCN1CCC([N@@H+]2CC[C@H]2C(C)C)CC1
>>> CCN1CCC([N@H+]2CC[C@H]2C(C)C)CC1
>>> CCN1CCC([N@H+]2CC[C@H]2C(C)C)CC1
>>>
>>>
>>> Could you please send the SDF that corina generates so I can try to
>>> reproduce the problem (or at least try to understand what's gong on) from
>>> that?
>>>
>>> Thanks,
>>> -greg
>>>
>>> On Wed, Aug 19, 2015 at 3:00 PM, Rob Smith <[email protected]> wrote:
>>>
>>>> Dear RDKit community,
>>>>
>>>> I'm trying to use RDKit to read in Corina generated stereoisomers (from
>>>> a Mol file), assign chiral tags and stereochemistry to the structure and
>>>> output the canonical smiles string for each isomer of a given molecule (in
>>>> Python), when I do this, half the canonical smiles strings are not unique.
>>>>
>>>> When I read in the output from Corina into an Indigo instance, then use
>>>> the canonical smiles from Indigo to create an RDKit molecule, canonical
>>>> smiles strings generated from the molecule objects are all unique.
>>>>
>>>> I may be missing an option to enable RDKit to 'visualise' the chiral
>>>> centre adjacent to the protonated nitrogen, so if someone can spot where
>>>> I've made a mistake, I'd really appreciate it. I've included the output and
>>>> Python script below. If you require any further information, please let me
>>>> know.
>>>>
>>>> Many thanks,
>>>> Rob
>>>>
>>>> Output:
>>>>
>>>> RDKit Read in of Molecule
>>>> RDKit Output -  CCN1CC[C@@H]([N@@H+]2CC[C@@H]2[C@H](C)C)CC1
>>>> RDKit Output -  CCN1CC[C@@H]([N@@H+]2CC[C@@H]2[C@H](C)C)CC1
>>>> RDKit Output -  CCN1CC[C@@H]([N@H+]2CC[C@@H]2[C@H](C)C)CC1
>>>> RDKit Output -  CCN1CC[C@@H]([N@H+]2CC[C@@H]2[C@H](C)C)CC1
>>>> RDKit Output -  CCN1CC[C@@H]([N@@H+]2CC[C@H]2[C@H](C)C)CC1
>>>> RDKit Output -  CCN1CC[C@@H]([N@@H+]2CC[C@H]2[C@H](C)C)CC1
>>>> RDKit Output -  CCN1CC[C@@H]([N@H+]2CC[C@H]2[C@H](C)C)CC1
>>>> RDKit Output -  CCN1CC[C@@H]([N@H+]2CC[C@H]2[C@H](C)C)CC1
>>>>
>>>> INDIGO Read in of Molecule
>>>> RDKit Output -  CC[N@]1CC[C@@H]([N@@H+]2CC[C@@H]2C(C)C)CC1
>>>> RDKit Output -  CC[N@]1CC[C@H]([N@@H+]2CC[C@@H]2C(C)C)CC1
>>>> RDKit Output -  CC[N@]1CC[C@@H]([N@H+]2CC[C@@H]2C(C)C)CC1
>>>> RDKit Output -  CC[N@]1CC[C@H]([N@H+]2CC[C@@H]2C(C)C)CC1
>>>> RDKit Output -  CC[N@]1CC[C@@H]([N@@H+]2CC[C@H]2C(C)C)CC1
>>>> RDKit Output -  CC[N@]1CC[C@H]([N@@H+]2CC[C@H]2C(C)C)CC1
>>>> RDKit Output -  CC[N@]1CC[C@@H]([N@H+]2CC[C@H]2C(C)C)CC1
>>>> RDKit Output -  CC[N@]1CC[C@H]([N@H+]2CC[C@H]2C(C)C)CC1
>>>>
>>>> Python script :
>>>>
>>>> from rdkit import Chem
>>>> import subprocess # Used to run Corina
>>>> from indigo import *
>>>>
>>>> def runCorinaTest(inputMol):
>>>>     indigo = Indigo()
>>>>
>>>>     molFile = Chem.MolToMolBlock(inputMol)
>>>>
>>>>     corinaCommand = "echo \'" + molFile + "\' | "
>>>>     # Then Corina - generate stereoisomers...
>>>>     corinaCommand = corinaCommand + "/apps/corina/corina -t n -d
>>>> canon,stergen,preserve,names,wh,flapn,msc=7,msi=128 -i t=sdf"
>>>>     corinaResult = subprocess.check_output([corinaCommand], shell=True)
>>>> # Gives the stereoisomer species as an SDF string
>>>>
>>>>     allMoleculeObjects = []
>>>>     allMolecules = corinaResult.split("$$$$\n") # Separate Corina
>>>> output into individual molecules
>>>>     allMolecules = allMolecules[0:len(allMolecules)-1]
>>>>
>>>>     print("RDKit Read in of Molecule")
>>>>
>>>>     for eachMolecule in allMolecules:
>>>>         eachMolecule = eachMolecule + "$$$$\n"
>>>>         mol = Chem.MolFromMolBlock(eachMolecule, sanitize=True,
>>>> removeHs=True, strictParsing=False)
>>>>         Chem.rdmolops.AssignAtomChiralTagsFromStructure(mol,
>>>> replaceExistingTags=True)
>>>>         Chem.rdmolops.AssignStereochemistry(mol)
>>>>         print("RDKit Output -  " + Chem.MolToSmiles(mol,
>>>> isomericSmiles=True))
>>>>
>>>>     print("INDIGO Read in of Molecule")
>>>>     for eachMolecule in allMolecules:
>>>>         eachMolecule = eachMolecule + "$$$$\n"
>>>>         mol = indigo.loadMolecule(eachMolecule)
>>>>         # print("Indigo Output - " + mol.canonicalSmiles())
>>>>         # Use Indigo Canonical Smiles to create RDKit molecule
>>>>         mol = Chem.MolFromSmiles(mol.canonicalSmiles())
>>>>         if mol is not None:
>>>>             print("RDKit Output -  " + Chem.MolToSmiles(mol,
>>>> isomericSmiles=True))
>>>>
>>>>     return 0
>>>>
>>>> mol = Chem.MolFromSmiles("CC(C)C1[NH+](C2CCN(CC)CC2)CC1")
>>>> z = runCorinaTest(mol)
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> Rdkit-discuss mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>>
>>>>
>>>
>>
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Rdkit-discuss mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
------------------------------------------------------------------------------
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to