Hi Ivan, Short answer: I would not normally expect a second sanitization to fail if the first succeeds, but your input SMILES is very odd and triggers a bug.
This is an interesting edge case for the sanitization code because it includes a weird mix of aromatic and aliphatic atoms and bonds, I do hope this came out of some computational process and isn't a "real" molecule. You almost couldn't have picked a better example to highlight the situation that's causing the problem here. Some form of congratulations are in order. :-) Here's an explanation of what's going on with your molecule C1=n(C)-c=Cn1 The fundamental problem is that atom 1 (the first nitrogen) has a valence of 4 and is neutral... If you wrote the SMILES as C1=N(C)C=CN1, which is what the sanitization process produces, I don't think you'd be surprised that the RDKit sanitization fails (and your second call to sanitize does fail). To understand why it passes the first time, you need to understand the flow of the sanitization process, described here; https://www.rdkit.org/docs/RDKit_Book.html#molecular-sanitization Step 3, updatePropertyCache(), is the part that reports valency errors. There's a special case in this code for aromatic atoms that allows atoms like the N in Cn1cccc1 to pass sanitization even though they are formally four-valent (2x1.5 for the aromatic bonds +1 for the C). Your molecule is triggering that special case because atom 1 is aromatic in the input SMILES. Incorrect aromatic rings that get through this step normally end up getting caught later when the molecule is kekulized (step 5). In your case there are no aromatic bonds to kekulize, so no error is thrown. The aromaticity perception (step 6) does not consider the ring to be aromatic, so the final molecule is the equivalent of C1=N(C)C=CN1. It ought to be possible to clear this in the sanitization code relatively easily; I just need to think about it a bit and do a bunch of testing. -greg On Tue, Oct 30, 2018 at 10:02 PM Ivan Tubert-Brohman < [email protected]> wrote: > Hi, > > I was surprised to see that a (dubious) structure that goes through > SanitizeMol OK can fail a subsequent sanitization call: > > print("Start") > mol = Chem.MolFromSmiles('C1=n(C)-c=Cn1', sanitize=False) > print("Before first sanitization") > Chem.SanitizeMol(mol) > print("Before second sanitization") > Chem.SanitizeMol(mol) > print("Done") > > > The output is: > > Start > Before first sanitization > Before second sanitization > [16:54:20] Explicit valence for atom # 1 N, 4, is greater than permitted > Traceback (most recent call last): > File "./san.py", line 9, in <module> > Chem.SanitizeMol(mol) > ValueError: Sanitization error: Explicit valence for atom # 1 N, 4, is > greater than permitted > > > Is this an unavoidable aspect of the way SanitizeMol works, since it does > several operations (Kekulize, check valencies, set aromaticity, conjugation > and hybridization) in a certain order, or should this be considered a bug? > > Best, > Ivan > _______________________________________________ > Rdkit-discuss mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >
_______________________________________________ Rdkit-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

