All,
I am working on a filtering code in python to search for substructure matches
against my hit list (in SMILES) and my filter lists (in SMARTS). My current
filter lists were copied from Rajarshi Guha's blog at
http://blog.rguha.net/?p=850.
While working on this I was working with the following SMARTS string from the
p_l150 collection, filter purrole_A(118):
n2(-[#6]:1:[!#1]:[#6]:[#6]:[#6]:[#6]:1)c(cc(c2-[#6;X4])-[#1])-[#6;X4]
I have highlighted the problem area in the string. Although this should be
interpreted as 'not H', the rendering generated from Chem.MolFromSmarts does
indeed result in a hydrogen in this position, which is in the middle of an
aromatic ring and results in a valency issue and as such I can't standardize
the mol for filtering purposes.
I confirmed this by making the following edit to the SMILES string:
n2(-[#6]:1:[!#6]:[#6]:[#6]:[#6]:[#6]:1)c(cc(c2-[#6;X4])-[#1])-[#6;X4]
Which results in a carbon in the position of the hydrogen from the original
SMARTS. Is this a problem with the SMARTS translator? Or is there something
that I am missing?
I believe this happens quite frequently. When running a standardization code
for the filter p_l150 (55 compounds) using:
p_l150['standardized mol']=''
imax,jmax = p_l150.shape
for i in range(imax):
mol_file =mf= p_l150.loc[i,'mol file']
s = Standardizer()
try:
m = Chem.MolToSmiles(mf)
m2 = standardize_smiles(m)
m3 = Chem.MolFromSmiles(m2)
smol = s.standardize(m3)
p_l150.loc[i,'standardized mol'] = smol
except Exception as e:
print p_l150.loc[i,'filter'], e
p_l150
I return 11 errors, 8 of which are valency (7 of those involve hydrogens):
<regId="pyrrole_A(118)"> Sanitization error: Explicit valence for atom # 8 H,
3, is greater than permitted
<regId="imine_one_fives(89)"> Sanitization error: Explicit valence for atom # 3
H, 3, is greater than permitted
<regId="hzone_pipzn(79)"> Sanitization error: Explicit valence for atom # 3 H,
2, is greater than permitted
<regId="hzone_pyrrol(64)"> Sanitization error: Can't kekulize mol
<regId="cyano_pyridone_A(54)"> Sanitization error: Explicit valence for atom #
1 H, 3, is greater than permitted
<regId="het_pyridiniums_A(39)"> Sanitization error: Explicit valence for atom #
5 H, 3, is greater than permitted
<regId="diazox_sulfon_A(36)"> Sanitization error: Explicit valence for atom #
14 C, 5, is greater than permitted
<regId="pyrrole_B(29)"> Sanitization error: Explicit valence for atom # 9 H, 3,
is greater than permitted
<regId="thiophene_hydroxy(28)"> Sanitization error: Can't kekulize mol
<regId="imidazole_A(19)"> Sanitization error: Explicit valence for atom # 4 H,
2, is greater than permitted
<regId="het_6_tetrazine(18)"> Sanitization error: Aromatic bonds on non
aromatic atom 1
Any insight would be greatly appreciated.
Thank you
Christopher R. Bodle
PhD Candidate, University of Iowa
College of Pharmacy
Division of Medicinal and Natural Products Chemistry
115 S. Grand Avenue-Rm. S338
Iowa City, Iowa 52242
(319) 335-7845
------------------------------------------------------------------------------
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss