Dear Alexis, the concept you are describing is pretty much exactly the reason why molecular keys / fingerprints were invented in the first place. I would suggest to take a look at the RDKit database cartridge (https://www.rdkit.org/docs/Cartridge.html) since that should basically do what you want to achieve: you import your millions of structures into a database, build an index (that consists mainly of suitable fingerprints) and then pre-filter your substructure searches with that index.
Hope that helps, Nils Am 18.08.2018 um 11:16 schrieb Alexis Parenty: > Dear rdkiter, > > I’d like to optimize an algorithm that is slow due to substructure > searches. I am doing several millions of substructure searches using > mol1.HasSubstructurMatch(mol2). > > I have hundreds of mol1s and millions of mol2s. Most of the time mol2 is > not a substructure of mol1 so I was thinking to use a filter to skip the > expensive substructure search calculation when mol2 is guaranteed not to > be a substructure of mol1 such as when: > > - Molecular formula of mol2 cannot be part of molecular formula > of mol1 (e.g.: C5H5N versus C6H6) > > - Molecular weight of mol2 is higher than Molecular weight of mol1. > > I am hoping this filter would skip many substructure searches, but have > I forgotten something else that could be used in my filter. Is there a > way to use some fingerprint ? > > > > I can store molecular formula, RDKFingerprint, and molecular weight of > mol1s and mol2s in a dictionary so I don’t have to calculate them on the > flight. Note that I do not have enough memory available to store all the > mol2s. > > > > Any advice would be very much appreciated. > > > > Best, > > Alexis > > > > > > ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Rdkit-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

