On Jun 16, 2015, at 10:20 PM, Peter Shenkin wrote:
> [N-]=[N+]=NC(=O)N1C(=O)N([N+]([O-])=O)C2(C13C4=C56)C4=C5C2=C36
> [N-]=[N+]=NC(=O)N(C(=O)N1[N+]([O-])=O)C(c23)(c4c56)C16c3c5c24
>
> rdkit canonicalizes the two to the following, respectively:
>
> [N-]=[N+]=NC(=O)N1C(=O)N([N+](=O)[O-])C23c4c5c2c2c-5c4C213
> [N-]=[N+]=NC(=O)N1C(=O)N([N+](=O)[O-])C23c4c5c6c(c2c4=6)C513
> I believe these represent the same structure, with the following caveat:
>
> It is not impossible that the two SMILES actually code for different
> structures in some subtle way. I've tried visualizing them in several
> packages, however, and I've not been able to find a difference.
I've found SMARTSviewer to be an excellent way to help resolve these problems,
because it doesn't try to do any aromaticity perception. Oddly though, it
fails on the second of the 4 SMILES saying "SMARTS syntax is not correct".
I can't figure out why.
BTW, to help it out, you can ask RDKit to include all of the bond information,
as otherwise it will use the "single-or-aromatic" notation.
>>> from rdkit import Chem
>>> mol1 =
>>> Chem.MolFromSmiles("[N-]=[N+]=NC(=O)N1C(=O)N([N+]([O-])=O)C2(C13C4=C56)C4=C5C2=C36")
>>> Chem.MolToSmiles(mol1, allBondsExplicit=True)
'[N-]=[N+]=N-C(=O)-N1-C(=O)-N(-[N+](=O)-[O-])-C23-c4:c5:c-2:c2:c-5:c:4-C-2-1-3'
>>> mol2 =
>>> Chem.MolFromSmiles("[N-]=[N+]=NC(=O)N(C(=O)N1[N+]([O-])=O)C(c23)(c4c56)C16c3c5c24")
>>> Chem.MolToSmiles(mol2, allBondsExplicit=True)
'[N-]=[N+]=N-C(=O)-N1-C(=O)-N(-[N+](=O)-[O-])-C23-c4:c5:c6:c(:c-2:c:4=6)-C-5-1-3'
I don't know how it is that RDKit adds a double bond to the second cubane,
given only aromatic carbons and single-or-aromatic bonds in the original
SMILES.
Andrew
[email protected]
------------------------------------------------------------------------------
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss