On Feb 1, 2011, at 4:49 AM, Chris Morley wrote: > On 01/02/2011 07:12, Andrew Dalke wrote: >> On Jan 31, 2011, at 8:15 PM, chakravar...@ncbs.res.in wrote: >>> I noticed that molecules such as Myristic acid and Palmitic acid >>> have same >>> similarity score of 1, >> ... >>> I am thinking of modifying Tanimoto score to other coefficient's >>> like Kulczynski >>> index or Russel index. >> The only way to get a Tanimoto score of 1 is if the two >> fingerprints are identical. In that case there is no scoring method >> can tell the difference between the two because they are identical. >> >> To get what you want you'll need to come up with a new >> fingerprinting scheme, not a new scoring method. > None of OpenBabel's fingerprints provide a complete description of a > molecule. They are really intended as part of a fast screening > method to > exclude molecules that, compared with a target molecule, are too > dissimilar (or too similar) or which are not a superstructure of it. > None of the current fingerprint types include stereochemistry and the > FP2 fingerprint has a built-in lack of certainty because different > fragments can be assigned to the the same bit. It also indexes by the > presence or absence of fragments of up to 7 atoms, so does not > discriminate well for long chains of carbon atoms, like fatty acids or > normal hydrocarbons. It is possible make specialized FP3 > fingerprints to > handle this type of structure, by including the number of times a > substructure occurs. Further description is in the code, although > recompilation is not necessary to make a new fingerprint type. > However I > guess this is probably further than you want to go. >
It looks like that is where I want to go. It seems that the FP2 isn't going to be good enough and I will probably end up having to customize at least to some extent. If we are going to get into customization, we may as well also look at tuning the fingerprints so that similar structures result in somewhat similar activity. To that end we are putting together a testing dataset from our open compounds. It looks like it will be about 9000 compounds chosen by: 1) been tested at least twice in the NCI-60 dose response assay. This should mean that the NCI-60 correlations should be reasonably well determined. 2) the 2D structure exists and is consistent with the molecular formula stored independently in our database. This consistency is checked via CDK, so it means that at least CDK is able to assign atoms types well enough to get to the correct molecular formula. We will calculate all the NCI-60 pairwise correlations and will post these and the structures. Should be done in a week or so. We will be looking to find a set of fingerprints that 1) never (or as close to never as we can get) return a value of 1.0 for different structures. 2) has a well behaved (or maybe just well documented) relation between structure similarity and NCI-60 correlation. I'm not sure what we will get here, but I would like to be able to say something like a similarity score of >0.9 gives a 80% chance of a NCI-60 correlation of >0.6. I'm thinking we might also put together a dataset from the compounds tested in the onedose assay. That set would make it possible to look at relation of structure similarity to chances a compound will pass the onedose criteria. DanZ /******************************************** * Daniel Zaharevitz * Chief, Information Technology Branch * Developmental Therapeutics Program * National Cancer Institute * zahar...@mail.nih.gov * ********************************************/ ------------------------------------------------------------------------------ Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d _______________________________________________ OpenBabel-discuss mailing list OpenBabel-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-discuss