Dear Gonzalo, On Tue, Jul 24, 2012 at 2:00 PM, Gonzalo Colmenarejo-Sanchez <[email protected]> wrote: > > I’ve been doing speed comparisons of SMARTS matching calculations between > Daylight (dt_match) and the latest release of the RDKit (SubstructMatch). A > matrix of 4015 SMILES matched against 1390 SMARTS took 187 s in DL, while it > took 1615 s in the RDKit program. Maybe this is an area of improvement of > the RDKit.
Thanks for that information; things like this are very useful. Can you share how you're doing the comparison or, even better, the SMARTS you are using? The reason I ask is that this seems a lot slower than I would expect, so I wonder if you are constructing the molecules from SMILES and the queries from SMARTS before each substructure matching call (this would be extremely slow) or if you build the molecules and queries once. Two of the regular benchmarks I run with the RDKit (http://code.google.com/p/rdkit/wiki/Benchmarking) involve substructure searching (t7 and t8 in that table) and there doing 428K matches takes about 6 seconds on my linux box. Another example calculation where I search for 500 substructures in 11000 molecules (about the same scale as your test) takes about 26 seconds. Best, -greg ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Rdkit-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

