Hi Ivan,

Short answer: I would not normally expect a second sanitization to fail if
the first succeeds, but your input SMILES is very odd and triggers a bug.

This is an interesting edge case for the sanitization code because it
includes a weird mix of aromatic and aliphatic atoms and bonds, I do hope
this came out of some computational process and isn't a "real" molecule.
You almost couldn't have picked a better example to highlight the situation
that's causing the problem here. Some form of congratulations are in order.
:-)

Here's an explanation of what's going on with your molecule C1=n(C)-c=Cn1
The fundamental problem is that atom 1 (the first nitrogen) has a valence
of 4 and is neutral...
If you wrote the SMILES as C1=N(C)C=CN1, which is what the sanitization
process produces, I don't think you'd be surprised that the RDKit
sanitization fails (and your second call to sanitize does fail).

To understand why it passes the first time, you need to understand the flow
of the sanitization process, described here;
https://www.rdkit.org/docs/RDKit_Book.html#molecular-sanitization
Step 3, updatePropertyCache(), is the part that reports valency errors.
There's a special case in this code for aromatic atoms that allows atoms
like the N in Cn1cccc1 to pass sanitization even though they are formally
four-valent (2x1.5 for the aromatic bonds +1 for the C). Your molecule is
triggering that special case because atom 1 is aromatic in the input
SMILES. Incorrect aromatic rings that get through this step normally end up
getting caught later when the molecule is kekulized (step 5). In your case
there are no aromatic bonds to kekulize, so no error is thrown. The
aromaticity perception (step 6) does not consider the ring to be aromatic,
so the final molecule is the equivalent of C1=N(C)C=CN1.

It ought to be possible to clear this in the sanitization code relatively
easily; I just need to think about it a bit and do a bunch of testing.

-greg








On Tue, Oct 30, 2018 at 10:02 PM Ivan Tubert-Brohman <
[email protected]> wrote:

> Hi,
>
> I was surprised to see that a (dubious) structure that goes through
> SanitizeMol OK can fail a subsequent sanitization call:
>
> print("Start")
> mol = Chem.MolFromSmiles('C1=n(C)-c=Cn1', sanitize=False)
> print("Before first sanitization")
> Chem.SanitizeMol(mol)
> print("Before second sanitization")
> Chem.SanitizeMol(mol)
> print("Done")
>
>
> The output is:
>
> Start
> Before first sanitization
> Before second sanitization
> [16:54:20] Explicit valence for atom # 1 N, 4, is greater than permitted
> Traceback (most recent call last):
>   File "./san.py", line 9, in <module>
>     Chem.SanitizeMol(mol)
> ValueError: Sanitization error: Explicit valence for atom # 1 N, 4, is
> greater than permitted
>
>
> Is this an unavoidable aspect of the way SanitizeMol works, since it does
> several operations (Kekulize, check valencies, set aromaticity, conjugation
> and hybridization) in a certain order, or should this be considered a bug?
>
> Best,
> Ivan
> _______________________________________________
> Rdkit-discuss mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to