Dear Sune,

On Tue, Apr 26, 2011 at 10:33 PM,  <[email protected]> wrote:
> I am new to RDKit and have a few questions that I hope you can help answer.
> I am currently trying to find a suitable open source cartridge and database
> alternative to the commercial chemistry cartridges that only run on Oracle
> or Microsoft SQL Server. What I am hoping to find is an open source
> chemistry cartridge/database setup that has few dependencies, correct
> representation of chemistry and scale to several million compounds.

I think the RDKit has what you're looking for. I'll give my
(admittedly biased) perspective below.

> I would therefore greatly appreciate any information and experience from
> users and developers of RDKit on any of the following questions
>
> 1. What level of “correctness” in terms of chemistry does the current
> release of RDKit show? Here, I am thinking of things like conversion errors
> between different formats eg. SMILES to MDL molfile back to SMILES plus
> correct identification of hetero aromatic rings, chirality etc.

The RDKit places a very strong emphasis on chemical correctness. This
means ensuring that input structures make chemical sense (i.e. no
five-coordinate neutral carbon) as well as correct handling chirality
(tetrahedral chirality) and stereochemistry about double bonds. I've
done numerous smiles -> mol -> smiles and mol -> smiles -> mol ->
smiles tests over the years to identify problems and I fix things
whenever bugs are found. Having said that, stereochemistry is not easy
and I'm sure there are still bugs lurking in the code. There are also
some known problems with:
1) handling of stereochemistry in chemical reactions (not currently
preserved in many circumstances)
2) the generation of canonical SMILES for the cis- and
trans-arrangments in non-chiral substituted ring systems like
cyclohexanes (e.g. C[C@H]1CC[C@H](C)CC1 vs C[C@H]1CC[C@@H](C)CC1).
Here the SMILES generated should be correct, but it's not canonical.

> 2. What are the users experience in terms of the RDKit cartridge running on
> PostgreSQL in terms of stability, scalability, speed and ease of use? I
> would also be very happy to know if anybody is currently using
> RDKit/Postgres in a production environment.

There are some wiki posts about the cartridge here;
http://code.google.com/p/rdkit/w/list?q=label:cartridge
Perhaps most relevant to your question is this one:
http://code.google.com/p/rdkit/wiki/DatabaseCreation2
which looks at performance for loading and searching a database
containing the emolecules catalog (about 5 million compounds).

We've publicly discussed the fact that within NIBR we use the
cartridge in production for similarity searching and related tasks.
The strength of the cartridge is that it provides access to a number
of different types of fingerprint (both bit-vector and count-based
fps) and that it's easy to add new fingerprints.

We do not use the cartridge for large-scale substructure searching. We
have a really good system for this in-house already. As you'll see
from the wiki page above, performance for SMILES-based SSS queries
(i.e. ones without query features) is pretty good, but this breaks
down quickly as soon as you start including query features by using
SMARTS. This is a result of limitations in the current fingerprinting
algorithm used for SSS queries. This is fixable, but it's not a high
priority for me at the moment.

> 3. What are the main limitations to the RDKit cartridge compared to the
> commercial alternatives like Accelrys Direct or Daycart from Daylight?

I haven't done a head-to-head comparison on performance (such things
are difficult to do), but I would guess that both commercial
cartridges (certainly the one from Daylight) have better
substructure-search performance. I don't think any of the commercial
cartridges support as broad of a spectrum of similarity metrics.

> 4. What applications and tools are users building with RDKit?

I can only answer for work I've been involved in:
Aside from the RDKit itself and the cartridge:
1) we've released a set of RDKit-based nodes for Knime (work done
together with knime.com).
2) we've published the use the RDKit to characterize atom environments
and predict F chemical shifts
3) we've discussed using the system to score and filter docking poses
based on pharmacophoric features, align molecules, and do
building-block selection

Hope this helps,
-greg

------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to