James,
On Fri, Sep 30, 2011 at 8:48 AM, James Davidson <[email protected]> wrote:
>
> Greg wrote:
>> You actually don't need to add the Hs:
>> >>> p1 = Chem.MolFromSmarts('[#7,#8;H1]')
>> >>> p2 = Chem.MolFromSmarts('[#7,#8;H2]')
>> >>> p3 = Chem.MolFromSmarts('[#7,#8;H3]') m =
>> >>> Chem.MolFromSmiles('CC(=O)N')
>> >>> m2 = Chem.MolFromSmiles('OCC(=O)N')
>> >>> def NHOHCount(mol): return
>> >>>
>> len(mol.GetSubstructMatches(p1))+2*len(mol.GetSubstructMatches(p2))+
>> >>> 3*len(mol.GetSubstructMatches(p3))
>> ...
>> >>> NHOHCount(m)
>> 2
>> >>> NHOHCount(m2)
>> 3
>
> I think this system works well in almost all cases : ) However, I had a
> nagging concern over a couple of 'edge' cases - namely water, and
> ammonia (and for that matter, the oxonium and ammonium ions).
You're exactly right. I showed the SMARTS-based version as a simple
illustration. The version that's actually checked in is using a
different method (it loops over all O and N atoms and counts the
number of Hs connected to each).
> I guess the simple inclusion of P4 = Chem.MolFromSmarts('[#8;H4]') would
> make sure all cases were covered(?).
>
> Out of interest, I decided to compile a small list of 'normal' and
> 'edge' case SMILES, and ran it through the MOE descriptor node in KNIME.
> For all these cases, lip_don behaves as I would expect (tab-separated
> output included below)
Some comments on this below.
>
> "SMILES" "a_acc" "a_don" "lip_acc" "lip_don"
> "CO" 1.0 1.0 1.0 1.0
> "C(=O)N" 1.0 1.0 2.0 2.0
> "O" 1.0 1.0 1.0 2.0
> "CN" 1.0 1.0 1.0 2.0
> "[O+]" 1.0 0.0 1.0 3.0
> "C[O+]" 1.0 0.0 1.0 2.0
> "[N+]" 0.0 0.0 1.0 4.0
> "C[N+]" 0.0 0.0 1.0 3.0
> "[N-]" 0.0 1.0 1.0 2.0
> "[O-]" 0.0 1.0 1.0 1.0
> "C(=O)[N-]" 0.0 1.0 2.0 1.0
For what it's worth: the results here are definitely not correct for
the SMILES as provided. Atoms in SMILES that are in square brackets
have no implicit Hs, so [N+] actually has zero hydrogens. I guess you
actually provided the molecules to MOE in some other form.
Sample script using your data (with corrected SMILES):
# -------------------
from rdkit import Chem
from rdkit.Chem import Lipinski
d=[
["CO", 1.0, 1.0, 1.0, 1.0,],
["C(=O)N", 1.0, 1.0, 2.0, 2.0],
["O", 1.0, 1.0, 1.0, 2.0,],
["CN", 1.0, 1.0, 1.0, 2.0,],
["[OH3+]", 1.0, 0.0, 1.0, 3.0,],
["C[OH2+]", 1.0, 0.0, 1.0, 2.0,],
["[NH4+]", 0.0, 0.0, 1.0, 4.0,],
["C[NH3+]", 0.0, 0.0, 1.0, 3.0,],
["[NH2-]", 0.0, 1.0, 1.0, 2.0,],
["[OH-]", 0.0, 1.0, 1.0, 1.0,],
["C(=O)[NH-]", 0.0, 1.0, 2.0, 1.0]]
print 'Smiles NOCount NHOHCount'
for row in d:
m = Chem.MolFromSmiles(row[0])
hba = Lipinski.NOCount(m)
hbd = Lipinski.NHOHCount(m)
print row[0],hba,hbd
#-----------------------------------
Output with the SVN version of the RDKit:
#------------------
Smiles NOCount NHOHCount
CO 1 1
C(=O)N 2 2
O 1 2
CN 1 2
[OH3+] 1 3
C[OH2+] 1 2
[NH4+] 1 4
C[NH3+] 1 3
[NH2-] 1 2
[OH-] 1 1
C(=O)[NH-] 2 1
#-----------------
Best,
-greg
------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss