Greg, > I used a number of data sets to do the testing, but here's the basic > idea: I take M patterns and search a pool of N molecules with them to > find substructure matches. This means, theoretically, that I would > have to do MxN substructure searches. I reduce this using th > substructure fingerprints: if the fingerprint for pattern molecule i > contains bits that are not set in pool molecule j, then i don't need > to do the substructure search for that pair of molecules. A perfectly > effective fingerprint would give 100% accuracy: every pair that passes > the fingerprint test would actually contain a substructure match. Of > course perfection is too much to hope for, but the goal is to get the > accuracy as high as possible.
I admit I haven't touched the RDKit substructure search yet, but what you write hear is the normal definition of fingerprinting as I learned it: bit_pattern(pattern) && bit_pattern(pool molecule) == bit_pattern(pattern) which should be about the same what you write, you just search inversely - or do I misunderstand something completely? Markus ------------------------------------------------------------------------------ Learn Windows Azure Live! Tuesday, Dec 13, 2011 Microsoft is holding a special Learn Windows Azure training event for developers. It will provide a great way to learn Windows Azure and what it provides. You can attend the event by watching it streamed LIVE online. Learn more at http://p.sf.net/sfu/ms-windowsazure _______________________________________________ Rdkit-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

