On Feb 1, 2011, at 4:49 AM, Chris Morley wrote:

> On 01/02/2011 07:12, Andrew Dalke wrote:
>> On Jan 31, 2011, at 8:15 PM, chakravar...@ncbs.res.in wrote:
>>> I noticed that molecules such as Myristic acid and Palmitic acid  
>>> have same
>>> similarity score of 1,
>>  ...
>>> I am thinking of modifying Tanimoto score to other coefficient's  
>>> like Kulczynski
>>> index or Russel index.
>> The only way to get a Tanimoto score of 1 is if the two  
>> fingerprints are identical. In that case there is no scoring method  
>> can tell the difference between the two because they are identical.
>>
>> To get what you want you'll need to come up with a new  
>> fingerprinting scheme, not a new scoring method.
> None of OpenBabel's fingerprints provide a complete description of a
> molecule. They are really intended as part of a fast screening  
> method to
> exclude molecules that, compared with a target molecule,  are too
> dissimilar (or too similar) or which are not a superstructure of it.
> None of the current fingerprint types include stereochemistry and the
> FP2 fingerprint has a built-in lack of certainty because different
> fragments can be assigned to the the same bit.  It also indexes by the
> presence or absence of fragments of up to 7 atoms, so does not
> discriminate well for long chains of carbon atoms, like fatty acids or
> normal hydrocarbons. It is possible make specialized FP3  
> fingerprints to
> handle this type of structure, by including the number of times a
> substructure occurs. Further description is in the code, although
> recompilation is not necessary to make a new fingerprint type.  
> However I
> guess this is probably further than you want to go.
>

It looks like that is where I want to go. It seems that the FP2 isn't  
going to be good enough and I will probably end up having to customize  
at least to some extent. If we are going to get into customization, we  
may as well also look at tuning the fingerprints so that similar  
structures result in somewhat similar activity. To that end we are  
putting together a testing dataset from our open compounds. It looks  
like it will be about 9000 compounds chosen by:
1) been tested at least twice in the NCI-60 dose  response assay. This  
should mean that the NCI-60 correlations should be reasonably well  
determined.
2) the 2D structure exists and is consistent with the molecular  
formula stored independently in our database. This consistency is  
checked via CDK, so it means that at least CDK is able to assign atoms  
types well enough to get to the correct molecular formula.

We will calculate all the NCI-60 pairwise correlations and will post  
these and the structures. Should be done in a week or so. We will be  
looking to find a set of fingerprints that
1) never (or as close to never as we can get) return a value of 1.0  
for different structures.
2) has a well behaved (or maybe just well documented) relation between  
structure similarity and NCI-60 correlation. I'm not sure what we will  
get here, but I would like to be able to say something like a  
similarity score of >0.9 gives a 80% chance of a NCI-60 correlation of  
 >0.6.

I'm thinking we might also put together a dataset from the compounds  
tested in the onedose assay. That set would make it possible to look  
at relation of structure similarity to chances a compound will pass  
the onedose criteria.

DanZ

/********************************************
  *  Daniel Zaharevitz
  *  Chief, Information Technology Branch
  *  Developmental Therapeutics Program
  *  National Cancer Institute
  *  zahar...@mail.nih.gov
  *
  ********************************************/





------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Reply via email to