Dear James,
On Wed, Sep 28, 2011 at 8:04 PM, James Davidson <[email protected]> wrote:
>
> Apologies for posting on a rather well-trodden (and tedious?) topic...
> I have just spent some time counting H-bond donors in a variety of ways
> for a 4000 compound data set - to see how our calculations could best
> match the results coming from a collaborator (it's as fun as it sounds).
Yeah that sounds like a blast. :-S
>
> As it turns-out, Descriptors.NumHDonors is pretty much on the money! (ie
> counting the number or N and O atoms that have at least one H attached)
> However, during this process I ended-up going back to a (the?) Lipinski
> paper -
>
> Christopher A Lipinski, Franco Lombardo, Beryl W Dominy, Paul J Feeney,
> Experimental and computational approaches to estimate solubility and
> permeability in drug discovery and development settings, Advanced Drug
> Delivery Reviews, Volume 46, Issues 1-3, 1 March 2001, Pages 3-26, ISSN
> 0169-409X, 10.1016/S0169-409X(00)00129-0.
>
> - to see what the Lipinski definition of Hydrogen Bond Donors was. I
> read the following:
>
> "We found that simply adding the number of NH bonds and OH bonds does
> remarkably well as an index of H bond donor character. Importantly, this
> parameter has direct structural relevance to the chemist."
Interesting. Thanks for actually reading the paper.
> As far as I can tell, this would require explicit addition of Hs to the
> molecule, followed by counting the number of matches for an NH or OH
> BOND; something like the following:
>
>>>> from rdkit import Chem
>>>> from rdkit.Chem import Descriptors
>
>>>> smarts = Chem.MolFromSmarts("[#7,#8]-[#1]")
>>>> mol = Chem.MolFromSmiles("CC(=O)N")
>>>> mol = Chem.AddHs(mol)
>>>> matches = mol.GetSubstructMatches(smarts)
>>>> print len(matches)
> 2
You actually don't need to add the Hs:
>>> p1 = Chem.MolFromSmarts('[#7,#8;H1]')
>>> p2 = Chem.MolFromSmarts('[#7,#8;H2]')
>>> p3 = Chem.MolFromSmarts('[#7,#8;H3]')
>>> m = Chem.MolFromSmiles('CC(=O)N')
>>> m2 = Chem.MolFromSmiles('OCC(=O)N')
>>> def NHOHCount(mol): return
>>> len(mol.GetSubstructMatches(p1))+2*len(mol.GetSubstructMatches(p2))+3*len(mol.GetSubstructMatches(p3))
...
>>> NHOHCount(m)
2
>>> NHOHCount(m2)
3
> The MOE descriptor lip_don seems to exactly reproduce these 'bond count'
> numbers for my set of compounds. So I guess my question is - shouldn't
> we be counting the NH and OH bonds for Lipinski-like counting? (and I
> guess this is what MOE's lip_don is for)
I think we should be, yes. I believe that this is a bug in the current
Lipinski.NHOHCount() function and I will go ahead and fix it. Thanks
for pointing it out.
Best,
-greg
------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss