Ah OK, I see ;)

This I don't know, sorry.

I have implemented a de novo design tool based on rdkit that made heavy use of the smarts matching accessed via Python. From my experience the substructure matching will probably not be the performance bottleneck (depending on how sophisticated your scoring functions is, of course).

Something you might want to consider regarding the performance of SMARTS queries is the SMARTS themselves. This is a quote from the daylight page http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html:


Efficiency Considerations
The Daylight 4.x SMARTS Toolkit provides a function, dt_smarts_opt(), which automatically optimizes a SMARTS by reordering, expanding, and/or consolidating atom and bond expressions. Programs which use this feature (e.g. the Merlin program) can be expected to be near optimal in terms of the time used to search typical organic structures.
When this optimization method is not used, there are some things which can be done to facilitate efficient (fast) searching operations using SMARTS. It is important to recognize that SMARTS target strings are processed in strictly left-to-right order. For this reason, substantial gains in speed can be achieved by following these guidelines:

  • Uncommon atoms or bond arrangements should be placed early in SMARTS targets.
  • In an "and-_expression_", the less common atom or bond specifications should be placed early.
  • In an "or-_expression_", the less common atom or bond specifications should be placed last.


I understand that the SMARTS you want to use have already been designed, and of course what is stated by daylight refers to the their implementation. But (a) I think it applies to the rdkit implementation as well (please correct me if I'm wrong here, Greg) and (b) in case performance is really critical this could be another point to look at.

Best,
Markus


On 04/25/2013 04:13 PM, Nicholas Firth wrote:
Hi Markus,

Thanks for the quick reply, I don't think I worded my question very well though.

I would like to know which is the most efficient way to implement the SMARTS queries, using the way I suggested, an alternative way in Python or piping into the C++ side of things. The reason being is that I work on de novo design and I generate a lot of molecules, so the most efficient way is quite important.

Sorry for the confusion,
Best,
Nick

Nicholas C. Firth | PhD Student | Cancer Therapeutics
The Institute of Cancer Research | 15 Cotswold Road | Belmont | Sutton | Surrey | SM2 5NG

T 020 8722 4033 | E [email protected] | W www.icr.ac.uk | Twitter @ICRnews

Facebook www.facebook.com/theinstituteofcancerresearch

Making the discoveries that defeat cancer



On 25 Apr 2013, at 14:52, Markus Hartenfeller <[email protected]> wrote:

m.HasSubstructMatch


The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network.

--
Markus Hartenfeller
Chemoinformatics Specialist
Molecular Health GmbH
Belfortstr. 2
69115 Heidelberg
Germany
Tel: +49 6221 43851 209
Fax: +49 6221 43851 100
Email: [email protected]
www.molecularhealth.com

----------------------------------------------------------
Molecular Health GmbH

Geschaeftsfuehrer: Dr. Stephan Brock/
Dr. Friedrich von Bohlen und Halbach

Sitz der Gesellschaft: Heidelberg
Handelsregister: Amtsgericht Mannheim - HRB 338037
----------------------------------------------------------
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to