Greg,

> I used a number of data sets to do the testing, but here's the basic
> idea: I take M patterns and search a pool of N molecules with them to
> find substructure matches. This means, theoretically, that I would
> have to do MxN substructure searches. I reduce this using th
> substructure fingerprints: if the fingerprint for pattern molecule i
> contains bits that are not set in pool molecule j, then i don't need
> to do the substructure search for that pair of molecules. A perfectly
> effective fingerprint would give 100% accuracy: every pair that passes
> the fingerprint test would actually contain a substructure match. Of
> course perfection is too much to hope for, but the goal is to get the
> accuracy as high as possible.

I admit I haven't touched the RDKit substructure search yet, but what
you write hear is the normal definition of fingerprinting as I learned
it:

bit_pattern(pattern) && bit_pattern(pool molecule) == bit_pattern(pattern)

which should be about the same what you write, you just search inversely -
or do I misunderstand something completely?

Markus


------------------------------------------------------------------------------
Learn Windows Azure Live!  Tuesday, Dec 13, 2011
Microsoft is holding a special Learn Windows Azure training event for 
developers. It will provide a great way to learn Windows Azure and what it 
provides. You can attend the event by watching it streamed LIVE online.  
Learn more at http://p.sf.net/sfu/ms-windowsazure
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to