Hmm, well - probably not, you mention the always present exception in chemistry, Peter (Sulfoxides have a similar situation, stereochemistry from lone pairs). But generally I still think it is more dangerous to keep or even perceive (from 3D) stereochemistry on three-coordinated N - you will do more harm with this than fix things.
On Thu, Aug 20, 2015 at 6:40 PM, Peter Shenkin <[email protected]> wrote: > "My initial answer, and I would love input on this, is that three-coordinate > N should always have stereochemistry removed." > > Umm... even if it's a bridgehead? > > -P. > > On Thu, Aug 20, 2015 at 10:30 AM, Greg Landrum <[email protected]> > wrote: >> >> This isn't a simple one, so it may take a bit to get to an answer that's >> comprehensible. >> >> There are two things going on here in the RDKit: >> 1) Ring stereochemistry >> 2) stereochemistry about nitrogen centers >> >> Let's start with the second, because it's easier: RDKit does not generally >> "believe in" stereochemistry around three coordinate nitrogens. Here's a >> very simple example: >> In [45]: m3 = Chem.MolFromSmiles('Br[N@](F)Cl') >> >> In [46]: Chem.MolToSmiles(m3,isomericSmiles=True) >> Out[46]: 'FN(Cl)Br' >> >> >> The 3D equivalent of that: >> In [41]: m = Chem.MolFromSmiles('BrN(F)Cl') >> >> In [42]: AllChem.EmbedMolecule(m) >> Out[42]: 0 >> >> In [43]: Chem.AssignAtomChiralTagsFromStructure(m) >> >> In [44]: Chem.MolToSmiles(m,isomericSmiles=True) >> Out[44]: 'FN(Cl)Br' >> >> Contrast this with what you get for a carbon: >> >> In [34]: m2 = Chem.MolFromSmiles('FC(Br)(Cl)I') >> >> In [35]: AllChem.EmbedMolecule(m2) >> Out[35]: 0 >> >> In [36]: Chem.AssignAtomChiralTagsFromStructure(m2) >> >> In [37]: Chem.MolToSmiles(m2,isomericSmiles=True) >> Out[37]: 'F[C@](Cl)(Br)I' >> >> >> Back to the first: ring stereochemistry. By this I mean things like >> C[C@H]1CC[C@@H](C)CC1 - molecules where the stereochemistry information is >> really about whether the substituents of the ring are cis or trans relative >> to the ring plane. >> >> The way the RDKit handles this is something of a hack: it doesn't identify >> those atoms as chiral centers, but it does preserve the chiral tags when >> generating a canonical SMILES: >> >> In [47]: m = Chem.MolFromSmiles('C[C@H]1CC[C@@H](C)CC1') >> >> In [48]: Chem.FindMolChiralCenters(m) >> Out[48]: [] >> >> In [49]: Chem.MolToSmiles(m,isomericSmiles=True) >> Out[49]: 'C[C@H]1CC[C@@H](C)CC1' >> >> Curiously, to me at least, it does the same thing with nitrogens; >> >> In [52]: m2 = Chem.MolFromSmiles('C[N@@]1CC[C@@H](C)CC1') >> >> In [53]: Chem.MolToSmiles(m2,isomericSmiles=True) >> Out[53]: 'C[C@H]1CC[N@](C)CC1' >> >> Lest anyone think that this might make sense because being a ring makes >> inversion more difficult, that's not what is going on here. If I make the >> ring truly chiral, then the stereochemistry of the N is removed: >> >> In [54]: m3 = Chem.MolFromSmiles('C[N@@]1CO[C@@H](C)CC1') >> >> In [55]: Chem.MolToSmiles(m3,isomericSmiles=True) >> Out[55]: 'C[C@H]1CCN(C)CO1' >> >> I believe that this inconsistent behavior is a bug: either N should always >> have the input stereochemistry preserved (and that should be perceived from >> the 3D coordinates) or it should never have the input stereochemistry >> preserved. My initial answer, and I would love input on this, is that >> three-coordinate N should always have stereochemistry removed. >> >> -greg >> >> >> >> On Thu, Aug 20, 2015 at 2:22 PM, Rob Smith <[email protected]> wrote: >>> >>> Hi Greg, >>> >>> I've attached the SDF that Corina generates. I'm not convinced it is a >>> problem, more an observation that I'm trying to understand. >>> >>> Looking at the results again today - it seems that from the Corina output >>> Indigo is interpreting the conformer (including whether the ethyl >>> substituent on the piperidine nitrogen is equatorial or axial) - and >>> outputting a canonical smiles string that has the conformer "encoded" in it >>> (using the chiral flags). Whereas RDKit is reading in the Corina output, >>> "discounting" whether the nitrogen is axial or equatorial (which due to >>> inversion I can understand) and interpreting it as having only two chiral >>> centers (which is correct). >>> >>> What is confusing me, is that when I supply RDKit with the canonical >>> smiles string from Indigo (which has the conformer "encoded" in it), and >>> then ask for the isomeric canonical smiles, it supplies the canonical smiles >>> with the conformer still "encoded" within it. >>> >>> For example, I read in the following canonical smiles string into RDKit: >>> CCN1CC[C@@H]([N@@H+]2CC[C@@H]2[C@H](C)C)CC1 (which was generated by reading >>> in one of the mols in the SD File into RDKit and output the isomeric >>> canonical smiles), running the FindMolChiralCenters on this molecule, >>> correctly reports the number of chiral centres to be 2 (6S, 9R), and then >>> asking it to output the canonical smiles string (with isomericSmiles=True) >>> gives CCN1CCC([N@@H+]2CC[C@@H]2C(C)C)CC1 (1). >>> >>> If I take the same mol file, read it into Indigo, and ask it to output >>> the canonical smiles string, I get: >>> CC(C)[C@H]1CC[N@H+]1[C@@H]1CC[N@@](CC1)CC, if I read this smiles string into >>> RDKit and run FindMolCenters on it, I get (3R, 6S) - which is fine, if I >>> then out the canonical smiles (again with isomericSmiles=True) I get >>> CC[N@]1CC[C@@H]([N@@H+]2CC[C@@H]2C(C)C)CC1. I expected this isomeric >>> canonical smiles to be the same as (1), however RDKit appears to conserve >>> the conformer representation given to it from an isomeric smiles string, but >>> when reading a Mol file doesn't keep all conformer information (axial or >>> equatorial substituents on a nitrogen). >>> >>> Thanks to all for your quick (and quick witted) responses >>> >>> Rob >>> >>> >>> On Thu, Aug 20, 2015 at 3:46 AM, Greg Landrum <[email protected]> >>> wrote: >>>> >>>> Hi Rob, >>>> >>>> The results below are quite strange. As John has already pointed out: >>>> there really shouldn't be chirality present on either the N+ or the C that >>>> has two methyls attached. >>>> >>>> I tried to reproduce the problem by running corina myself using the same >>>> command-line options you provided (from SMILES instead of SDF, but I don't >>>> think that should make a difference), but I get sensible results; >>>> >>>> In [5]: s = Chem.SDMolSupplier('sample.sdf') >>>> >>>> In [6]: for m in s: >>>> Chem.AssignAtomChiralTagsFromStructure(m) >>>> Chem.AssignStereochemistry(m,cleanIt=True,force=True) >>>> ...: print Chem.MolToSmiles(m,True) >>>> ...: >>>> CCN1CCC([N@@H+]2CC[C@@H]2C(C)C)CC1 >>>> CCN1CCC([N@@H+]2CC[C@@H]2C(C)C)CC1 >>>> CCN1CCC([N@H+]2CC[C@@H]2C(C)C)CC1 >>>> CCN1CCC([N@H+]2CC[C@@H]2C(C)C)CC1 >>>> CCN1CCC([N@@H+]2CC[C@H]2C(C)C)CC1 >>>> CCN1CCC([N@@H+]2CC[C@H]2C(C)C)CC1 >>>> CCN1CCC([N@H+]2CC[C@H]2C(C)C)CC1 >>>> CCN1CCC([N@H+]2CC[C@H]2C(C)C)CC1 >>>> >>>> In [7]: s = Chem.SDMolSupplier('sample.sdf') >>>> >>>> In [8]: for m in s: >>>> Chem.AssignAtomChiralTagsFromStructure(m) >>>> print Chem.MolToSmiles(m,True) >>>> ...: >>>> CCN1CCC([N@@H+]2CC[C@@H]2C(C)C)CC1 >>>> CCN1CCC([N@@H+]2CC[C@@H]2C(C)C)CC1 >>>> CCN1CCC([N@H+]2CC[C@@H]2C(C)C)CC1 >>>> CCN1CCC([N@H+]2CC[C@@H]2C(C)C)CC1 >>>> CCN1CCC([N@@H+]2CC[C@H]2C(C)C)CC1 >>>> CCN1CCC([N@@H+]2CC[C@H]2C(C)C)CC1 >>>> CCN1CCC([N@H+]2CC[C@H]2C(C)C)CC1 >>>> CCN1CCC([N@H+]2CC[C@H]2C(C)C)CC1 >>>> >>>> >>>> Could you please send the SDF that corina generates so I can try to >>>> reproduce the problem (or at least try to understand what's gong on) from >>>> that? >>>> >>>> Thanks, >>>> -greg >>>> >>>> On Wed, Aug 19, 2015 at 3:00 PM, Rob Smith <[email protected]> wrote: >>>>> >>>>> Dear RDKit community, >>>>> >>>>> I'm trying to use RDKit to read in Corina generated stereoisomers (from >>>>> a Mol file), assign chiral tags and stereochemistry to the structure and >>>>> output the canonical smiles string for each isomer of a given molecule (in >>>>> Python), when I do this, half the canonical smiles strings are not unique. >>>>> >>>>> When I read in the output from Corina into an Indigo instance, then use >>>>> the canonical smiles from Indigo to create an RDKit molecule, canonical >>>>> smiles strings generated from the molecule objects are all unique. >>>>> >>>>> I may be missing an option to enable RDKit to 'visualise' the chiral >>>>> centre adjacent to the protonated nitrogen, so if someone can spot where >>>>> I've made a mistake, I'd really appreciate it. I've included the output >>>>> and >>>>> Python script below. If you require any further information, please let me >>>>> know. >>>>> >>>>> Many thanks, >>>>> Rob >>>>> >>>>> Output: >>>>> >>>>> RDKit Read in of Molecule >>>>> RDKit Output - CCN1CC[C@@H]([N@@H+]2CC[C@@H]2[C@H](C)C)CC1 >>>>> RDKit Output - CCN1CC[C@@H]([N@@H+]2CC[C@@H]2[C@H](C)C)CC1 >>>>> RDKit Output - CCN1CC[C@@H]([N@H+]2CC[C@@H]2[C@H](C)C)CC1 >>>>> RDKit Output - CCN1CC[C@@H]([N@H+]2CC[C@@H]2[C@H](C)C)CC1 >>>>> RDKit Output - CCN1CC[C@@H]([N@@H+]2CC[C@H]2[C@H](C)C)CC1 >>>>> RDKit Output - CCN1CC[C@@H]([N@@H+]2CC[C@H]2[C@H](C)C)CC1 >>>>> RDKit Output - CCN1CC[C@@H]([N@H+]2CC[C@H]2[C@H](C)C)CC1 >>>>> RDKit Output - CCN1CC[C@@H]([N@H+]2CC[C@H]2[C@H](C)C)CC1 >>>>> >>>>> INDIGO Read in of Molecule >>>>> RDKit Output - CC[N@]1CC[C@@H]([N@@H+]2CC[C@@H]2C(C)C)CC1 >>>>> RDKit Output - CC[N@]1CC[C@H]([N@@H+]2CC[C@@H]2C(C)C)CC1 >>>>> RDKit Output - CC[N@]1CC[C@@H]([N@H+]2CC[C@@H]2C(C)C)CC1 >>>>> RDKit Output - CC[N@]1CC[C@H]([N@H+]2CC[C@@H]2C(C)C)CC1 >>>>> RDKit Output - CC[N@]1CC[C@@H]([N@@H+]2CC[C@H]2C(C)C)CC1 >>>>> RDKit Output - CC[N@]1CC[C@H]([N@@H+]2CC[C@H]2C(C)C)CC1 >>>>> RDKit Output - CC[N@]1CC[C@@H]([N@H+]2CC[C@H]2C(C)C)CC1 >>>>> RDKit Output - CC[N@]1CC[C@H]([N@H+]2CC[C@H]2C(C)C)CC1 >>>>> >>>>> Python script : >>>>> >>>>> from rdkit import Chem >>>>> import subprocess # Used to run Corina >>>>> from indigo import * >>>>> >>>>> def runCorinaTest(inputMol): >>>>> indigo = Indigo() >>>>> >>>>> molFile = Chem.MolToMolBlock(inputMol) >>>>> >>>>> corinaCommand = "echo \'" + molFile + "\' | " >>>>> # Then Corina - generate stereoisomers... >>>>> corinaCommand = corinaCommand + "/apps/corina/corina -t n -d >>>>> canon,stergen,preserve,names,wh,flapn,msc=7,msi=128 -i t=sdf" >>>>> corinaResult = subprocess.check_output([corinaCommand], shell=True) >>>>> # Gives the stereoisomer species as an SDF string >>>>> >>>>> allMoleculeObjects = [] >>>>> allMolecules = corinaResult.split("$$$$\n") # Separate Corina >>>>> output into individual molecules >>>>> allMolecules = allMolecules[0:len(allMolecules)-1] >>>>> >>>>> print("RDKit Read in of Molecule") >>>>> >>>>> for eachMolecule in allMolecules: >>>>> eachMolecule = eachMolecule + "$$$$\n" >>>>> mol = Chem.MolFromMolBlock(eachMolecule, sanitize=True, >>>>> removeHs=True, strictParsing=False) >>>>> Chem.rdmolops.AssignAtomChiralTagsFromStructure(mol, >>>>> replaceExistingTags=True) >>>>> Chem.rdmolops.AssignStereochemistry(mol) >>>>> print("RDKit Output - " + Chem.MolToSmiles(mol, >>>>> isomericSmiles=True)) >>>>> >>>>> print("INDIGO Read in of Molecule") >>>>> for eachMolecule in allMolecules: >>>>> eachMolecule = eachMolecule + "$$$$\n" >>>>> mol = indigo.loadMolecule(eachMolecule) >>>>> # print("Indigo Output - " + mol.canonicalSmiles()) >>>>> # Use Indigo Canonical Smiles to create RDKit molecule >>>>> mol = Chem.MolFromSmiles(mol.canonicalSmiles()) >>>>> if mol is not None: >>>>> print("RDKit Output - " + Chem.MolToSmiles(mol, >>>>> isomericSmiles=True)) >>>>> >>>>> return 0 >>>>> >>>>> mol = Chem.MolFromSmiles("CC(C)C1[NH+](C2CCN(CC)CC2)CC1") >>>>> z = runCorinaTest(mol) >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> _______________________________________________ >>>>> Rdkit-discuss mailing list >>>>> [email protected] >>>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>>>> >>>> >>> >> >> >> >> ------------------------------------------------------------------------------ >> >> _______________________________________________ >> Rdkit-discuss mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Rdkit-discuss mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > ------------------------------------------------------------------------------ _______________________________________________ Rdkit-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

