On Tue, Jul 24, 2012 at 4:38 PM, Gonzalo Colmenarejo-Sanchez
<[email protected]> wrote:
>
> Sorry I can't share the SMILES and SMARTS, they are proprietary.
yeah, I kind of figured that would be the case. :-)
> If you can send me your structures I can test them with my program.
The scripts and data for the benchmarking are all in $RDBASE/Regress
> I double loop in the building of molecules and queries; the actual code is
> this:
>
> for (i = 0; i < numsmi; i++)
> {
> mol = SmilesToMol(smiles[i].smiles);
> numsims = 0;
> fprintf(fpout, "%s,", smiles[i].smiles);
> fprintf(stdout, "%d\n", i);
> for (j = 0; j < numsma; j++)
> {
> pattern = SmartsToMol(smarts[j].smarts);
> matchesfound = SubstructMatch(*mol,*pattern,matches, false,
> false);
> if (matchesfound == true)
> {
> numsims = numsims + 1;
> if (numsims == 1) fprintf(fpout, "%s\n", smarts[j].smarts);
> else fprintf(fpout, "%s,%s\n", smiles[i].smiles,
> smarts[j].smarts);
> }
> delete pattern;
> }
> if (numsims == 0) fprintf(fpout, "\n");
> delete mol;
> }
>
>
> The same double loop structure is used in the DL program. I could build the
> molecules and queries at once as you suggest but I'm kind of testing my
> typical situation that involves millions of molecules - not sure if that many
> of molecules can be stored in memory.
>
The above is ok w.r.t. the molecules: each molecule is only
constructed once.[1] Your SMARTS queries are, on the other hand, being
constructed over and over again. You would probably see some speedup
by building the query molecules outside the molecule loop and just
using those inside the loop.
-greg
[1] Note: if you have a set of molecules you process over and over
again, there are some time-saving tricks for working with them. One is
to process them once and then save them in binary form, the other is
to process them once, output the RDKit canonical SMILES, and then
rebuild molecules from that using only partial sanitization.
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss