I agree with remove - the chance that you destroy actual information by this is low - or in other words, the chance that steroinformation on three-coordinate N is spurious I would expect as high.
Markus On Thu, Aug 20, 2015 at 4:30 PM, Greg Landrum <[email protected]> wrote: > This isn't a simple one, so it may take a bit to get to an answer that's > comprehensible. > > There are two things going on here in the RDKit: > 1) Ring stereochemistry > 2) stereochemistry about nitrogen centers > > Let's start with the second, because it's easier: RDKit does not generally > "believe in" stereochemistry around three coordinate nitrogens. Here's a > very simple example: > In [45]: m3 = Chem.MolFromSmiles('Br[N@](F)Cl') > > In [46]: Chem.MolToSmiles(m3,isomericSmiles=True) > Out[46]: 'FN(Cl)Br' > > > The 3D equivalent of that: > In [41]: m = Chem.MolFromSmiles('BrN(F)Cl') > > In [42]: AllChem.EmbedMolecule(m) > Out[42]: 0 > > In [43]: Chem.AssignAtomChiralTagsFromStructure(m) > > In [44]: Chem.MolToSmiles(m,isomericSmiles=True) > Out[44]: 'FN(Cl)Br' > > Contrast this with what you get for a carbon: > > In [34]: m2 = Chem.MolFromSmiles('FC(Br)(Cl)I') > > In [35]: AllChem.EmbedMolecule(m2) > Out[35]: 0 > > In [36]: Chem.AssignAtomChiralTagsFromStructure(m2) > > In [37]: Chem.MolToSmiles(m2,isomericSmiles=True) > Out[37]: 'F[C@](Cl)(Br)I' > > > Back to the first: ring stereochemistry. By this I mean things like > C[C@H]1CC[C@@H](C)CC1 - molecules where the stereochemistry information is > really about whether the substituents of the ring are cis or trans relative > to the ring plane. > > The way the RDKit handles this is something of a hack: it doesn't identify > those atoms as chiral centers, but it does preserve the chiral tags when > generating a canonical SMILES: > > In [47]: m = Chem.MolFromSmiles('C[C@H]1CC[C@@H](C)CC1') > > In [48]: Chem.FindMolChiralCenters(m) > Out[48]: [] > > In [49]: Chem.MolToSmiles(m,isomericSmiles=True) > Out[49]: 'C[C@H]1CC[C@@H](C)CC1' > > Curiously, to me at least, it does the same thing with nitrogens; > > In [52]: m2 = Chem.MolFromSmiles('C[N@@]1CC[C@@H](C)CC1') > > In [53]: Chem.MolToSmiles(m2,isomericSmiles=True) > Out[53]: 'C[C@H]1CC[N@](C)CC1' > > Lest anyone think that this might make sense because being a ring makes > inversion more difficult, that's not what is going on here. If I make the > ring truly chiral, then the stereochemistry of the N is removed: > > In [54]: m3 = Chem.MolFromSmiles('C[N@@]1CO[C@@H](C)CC1') > > In [55]: Chem.MolToSmiles(m3,isomericSmiles=True) > Out[55]: 'C[C@H]1CCN(C)CO1' > > I believe that this inconsistent behavior is a bug: either N should always > have the input stereochemistry preserved (and that should be perceived from > the 3D coordinates) or it should never have the input stereochemistry > preserved. My initial answer, and I would love input on this, is that > three-coordinate N should always have stereochemistry removed. > > -greg > > > > On Thu, Aug 20, 2015 at 2:22 PM, Rob Smith <[email protected]> wrote: >> >> Hi Greg, >> >> I've attached the SDF that Corina generates. I'm not convinced it is a >> problem, more an observation that I'm trying to understand. >> >> Looking at the results again today - it seems that from the Corina output >> Indigo is interpreting the conformer (including whether the ethyl >> substituent on the piperidine nitrogen is equatorial or axial) - and >> outputting a canonical smiles string that has the conformer "encoded" in it >> (using the chiral flags). Whereas RDKit is reading in the Corina output, >> "discounting" whether the nitrogen is axial or equatorial (which due to >> inversion I can understand) and interpreting it as having only two chiral >> centers (which is correct). >> >> What is confusing me, is that when I supply RDKit with the canonical >> smiles string from Indigo (which has the conformer "encoded" in it), and >> then ask for the isomeric canonical smiles, it supplies the canonical smiles >> with the conformer still "encoded" within it. >> >> For example, I read in the following canonical smiles string into RDKit: >> CCN1CC[C@@H]([N@@H+]2CC[C@@H]2[C@H](C)C)CC1 (which was generated by reading >> in one of the mols in the SD File into RDKit and output the isomeric >> canonical smiles), running the FindMolChiralCenters on this molecule, >> correctly reports the number of chiral centres to be 2 (6S, 9R), and then >> asking it to output the canonical smiles string (with isomericSmiles=True) >> gives CCN1CCC([N@@H+]2CC[C@@H]2C(C)C)CC1 (1). >> >> If I take the same mol file, read it into Indigo, and ask it to output the >> canonical smiles string, I get: CC(C)[C@H]1CC[N@H+]1[C@@H]1CC[N@@](CC1)CC, >> if I read this smiles string into RDKit and run FindMolCenters on it, I get >> (3R, 6S) - which is fine, if I then out the canonical smiles (again with >> isomericSmiles=True) I get CC[N@]1CC[C@@H]([N@@H+]2CC[C@@H]2C(C)C)CC1. I >> expected this isomeric canonical smiles to be the same as (1), however RDKit >> appears to conserve the conformer representation given to it from an >> isomeric smiles string, but when reading a Mol file doesn't keep all >> conformer information (axial or equatorial substituents on a nitrogen). >> >> Thanks to all for your quick (and quick witted) responses >> >> Rob >> >> >> On Thu, Aug 20, 2015 at 3:46 AM, Greg Landrum <[email protected]> >> wrote: >>> >>> Hi Rob, >>> >>> The results below are quite strange. As John has already pointed out: >>> there really shouldn't be chirality present on either the N+ or the C that >>> has two methyls attached. >>> >>> I tried to reproduce the problem by running corina myself using the same >>> command-line options you provided (from SMILES instead of SDF, but I don't >>> think that should make a difference), but I get sensible results; >>> >>> In [5]: s = Chem.SDMolSupplier('sample.sdf') >>> >>> In [6]: for m in s: >>> Chem.AssignAtomChiralTagsFromStructure(m) >>> Chem.AssignStereochemistry(m,cleanIt=True,force=True) >>> ...: print Chem.MolToSmiles(m,True) >>> ...: >>> CCN1CCC([N@@H+]2CC[C@@H]2C(C)C)CC1 >>> CCN1CCC([N@@H+]2CC[C@@H]2C(C)C)CC1 >>> CCN1CCC([N@H+]2CC[C@@H]2C(C)C)CC1 >>> CCN1CCC([N@H+]2CC[C@@H]2C(C)C)CC1 >>> CCN1CCC([N@@H+]2CC[C@H]2C(C)C)CC1 >>> CCN1CCC([N@@H+]2CC[C@H]2C(C)C)CC1 >>> CCN1CCC([N@H+]2CC[C@H]2C(C)C)CC1 >>> CCN1CCC([N@H+]2CC[C@H]2C(C)C)CC1 >>> >>> In [7]: s = Chem.SDMolSupplier('sample.sdf') >>> >>> In [8]: for m in s: >>> Chem.AssignAtomChiralTagsFromStructure(m) >>> print Chem.MolToSmiles(m,True) >>> ...: >>> CCN1CCC([N@@H+]2CC[C@@H]2C(C)C)CC1 >>> CCN1CCC([N@@H+]2CC[C@@H]2C(C)C)CC1 >>> CCN1CCC([N@H+]2CC[C@@H]2C(C)C)CC1 >>> CCN1CCC([N@H+]2CC[C@@H]2C(C)C)CC1 >>> CCN1CCC([N@@H+]2CC[C@H]2C(C)C)CC1 >>> CCN1CCC([N@@H+]2CC[C@H]2C(C)C)CC1 >>> CCN1CCC([N@H+]2CC[C@H]2C(C)C)CC1 >>> CCN1CCC([N@H+]2CC[C@H]2C(C)C)CC1 >>> >>> >>> Could you please send the SDF that corina generates so I can try to >>> reproduce the problem (or at least try to understand what's gong on) from >>> that? >>> >>> Thanks, >>> -greg >>> >>> On Wed, Aug 19, 2015 at 3:00 PM, Rob Smith <[email protected]> wrote: >>>> >>>> Dear RDKit community, >>>> >>>> I'm trying to use RDKit to read in Corina generated stereoisomers (from >>>> a Mol file), assign chiral tags and stereochemistry to the structure and >>>> output the canonical smiles string for each isomer of a given molecule (in >>>> Python), when I do this, half the canonical smiles strings are not unique. >>>> >>>> When I read in the output from Corina into an Indigo instance, then use >>>> the canonical smiles from Indigo to create an RDKit molecule, canonical >>>> smiles strings generated from the molecule objects are all unique. >>>> >>>> I may be missing an option to enable RDKit to 'visualise' the chiral >>>> centre adjacent to the protonated nitrogen, so if someone can spot where >>>> I've made a mistake, I'd really appreciate it. I've included the output and >>>> Python script below. If you require any further information, please let me >>>> know. >>>> >>>> Many thanks, >>>> Rob >>>> >>>> Output: >>>> >>>> RDKit Read in of Molecule >>>> RDKit Output - CCN1CC[C@@H]([N@@H+]2CC[C@@H]2[C@H](C)C)CC1 >>>> RDKit Output - CCN1CC[C@@H]([N@@H+]2CC[C@@H]2[C@H](C)C)CC1 >>>> RDKit Output - CCN1CC[C@@H]([N@H+]2CC[C@@H]2[C@H](C)C)CC1 >>>> RDKit Output - CCN1CC[C@@H]([N@H+]2CC[C@@H]2[C@H](C)C)CC1 >>>> RDKit Output - CCN1CC[C@@H]([N@@H+]2CC[C@H]2[C@H](C)C)CC1 >>>> RDKit Output - CCN1CC[C@@H]([N@@H+]2CC[C@H]2[C@H](C)C)CC1 >>>> RDKit Output - CCN1CC[C@@H]([N@H+]2CC[C@H]2[C@H](C)C)CC1 >>>> RDKit Output - CCN1CC[C@@H]([N@H+]2CC[C@H]2[C@H](C)C)CC1 >>>> >>>> INDIGO Read in of Molecule >>>> RDKit Output - CC[N@]1CC[C@@H]([N@@H+]2CC[C@@H]2C(C)C)CC1 >>>> RDKit Output - CC[N@]1CC[C@H]([N@@H+]2CC[C@@H]2C(C)C)CC1 >>>> RDKit Output - CC[N@]1CC[C@@H]([N@H+]2CC[C@@H]2C(C)C)CC1 >>>> RDKit Output - CC[N@]1CC[C@H]([N@H+]2CC[C@@H]2C(C)C)CC1 >>>> RDKit Output - CC[N@]1CC[C@@H]([N@@H+]2CC[C@H]2C(C)C)CC1 >>>> RDKit Output - CC[N@]1CC[C@H]([N@@H+]2CC[C@H]2C(C)C)CC1 >>>> RDKit Output - CC[N@]1CC[C@@H]([N@H+]2CC[C@H]2C(C)C)CC1 >>>> RDKit Output - CC[N@]1CC[C@H]([N@H+]2CC[C@H]2C(C)C)CC1 >>>> >>>> Python script : >>>> >>>> from rdkit import Chem >>>> import subprocess # Used to run Corina >>>> from indigo import * >>>> >>>> def runCorinaTest(inputMol): >>>> indigo = Indigo() >>>> >>>> molFile = Chem.MolToMolBlock(inputMol) >>>> >>>> corinaCommand = "echo \'" + molFile + "\' | " >>>> # Then Corina - generate stereoisomers... >>>> corinaCommand = corinaCommand + "/apps/corina/corina -t n -d >>>> canon,stergen,preserve,names,wh,flapn,msc=7,msi=128 -i t=sdf" >>>> corinaResult = subprocess.check_output([corinaCommand], shell=True) >>>> # Gives the stereoisomer species as an SDF string >>>> >>>> allMoleculeObjects = [] >>>> allMolecules = corinaResult.split("$$$$\n") # Separate Corina output >>>> into individual molecules >>>> allMolecules = allMolecules[0:len(allMolecules)-1] >>>> >>>> print("RDKit Read in of Molecule") >>>> >>>> for eachMolecule in allMolecules: >>>> eachMolecule = eachMolecule + "$$$$\n" >>>> mol = Chem.MolFromMolBlock(eachMolecule, sanitize=True, >>>> removeHs=True, strictParsing=False) >>>> Chem.rdmolops.AssignAtomChiralTagsFromStructure(mol, >>>> replaceExistingTags=True) >>>> Chem.rdmolops.AssignStereochemistry(mol) >>>> print("RDKit Output - " + Chem.MolToSmiles(mol, >>>> isomericSmiles=True)) >>>> >>>> print("INDIGO Read in of Molecule") >>>> for eachMolecule in allMolecules: >>>> eachMolecule = eachMolecule + "$$$$\n" >>>> mol = indigo.loadMolecule(eachMolecule) >>>> # print("Indigo Output - " + mol.canonicalSmiles()) >>>> # Use Indigo Canonical Smiles to create RDKit molecule >>>> mol = Chem.MolFromSmiles(mol.canonicalSmiles()) >>>> if mol is not None: >>>> print("RDKit Output - " + Chem.MolToSmiles(mol, >>>> isomericSmiles=True)) >>>> >>>> return 0 >>>> >>>> mol = Chem.MolFromSmiles("CC(C)C1[NH+](C2CCN(CC)CC2)CC1") >>>> z = runCorinaTest(mol) >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> _______________________________________________ >>>> Rdkit-discuss mailing list >>>> [email protected] >>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>>> >>> >> > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Rdkit-discuss mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > ------------------------------------------------------------------------------ _______________________________________________ Rdkit-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

