Hi Paul,

I havn't used it on pubchem or anything of that size, however based on 
validation work with 500 queries against 100K DB compounds (= max 50 
million comparisons, see  I guess you are looking at about something like 
~ 10-30 seconds on 50 million compounds depending on the size of your 
substructure that you are searching with. This is using the cartridge 
obvously. These numbers are based on a 2.5 GHz Core2Duo inside a 
MacBookPro.

There is also some useful information on those things in 

http://code.google.com/p/rdkit/wiki/DatabaseCreation2

when you look there, you will see that for the emolecules DB (size ~ 5 
million compounds) . As you can see search times depend heavily on the 
query. The method of speeding up things by using subset pages is really 
nice when you are using webpages to disaply results.

Please note the loading and fingerprint creation timings at the top of 
that page ;-)

Hope this helps.

Cheers
Nik




[email protected] 
01/19/2011 04:32 PM

To
RDKit Discuss <[email protected]>
cc

Subject
[Rdkit-discuss] PubChem search







Dear RDKit users,

has anyone used RDKit for local searches of PubChem?
Can be approximate numbers of the performance be given how long a
substructure search takes for, let's say, 50 million compounds?

Best regards,
Paul

This message and any attachment are confidential and may be privileged or
otherwise protected from disclosure. If you are not the intended 
recipient,
you must not copy this message or attachment or disclose the contents to
any other person. If you have received this transmission in error, please
notify the sender immediately and delete the message and any attachment
from your system. Merck KGaA, Darmstadt, Germany and any of its
subsidiaries do not accept liability for any omissions or errors in this
message which may arise as a result of E-Mail-transmission or for damages
resulting from any unauthorized changes of the content of this message and
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
subsidiaries do not guarantee that this message is free of viruses and 
does
not accept liability for any damages caused by any virus transmitted
therewith.

Click http://disclaimer.merck.de to access the German, French, Spanish and
Portuguese versions of this disclaimer.


------------------------------------------------------------------------------
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand 
malware threats, the impact they can have on your business, and how you 
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


_________________________

CONFIDENTIALITY NOTICE

The information contained in this e-mail message is intended only for the 
exclusive use of the individual or entity named above and may contain 
information that is privileged, confidential or exempt from disclosure 
under applicable law. If the reader of this message is not the intended 
recipient, or the employee or agent responsible for delivery of the 
message to the intended recipient, you are hereby notified that any 
dissemination, distribution or copying of this communication is strictly 
prohibited. If you have received this communication in error, please 
notify the sender immediately by e-mail and delete the material from any 
computer.  Thank you.
------------------------------------------------------------------------------
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand 
malware threats, the impact they can have on your business, and how you 
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to