Thanks for your reply, Greg.

On 04/13/2013 08:01 AM, Greg Landrum wrote:
Hi Markus,

On Fri, Apr 12, 2013 at 9:30 AM, Markus Hartenfeller
<[email protected]>  wrote:
I wanted to ask if anybody is currently working on an implementation of a
tautomer enumeration (and canonicalization) in RDKit.
I'm not, but definitely think it would be interesting.

This JCAMD paper from Sitzmann&  Ihlenfeldt from 2010 describes a in
implementation that looks fairly solid at a first glance, plus: they
published the rSMARTS transformations.

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2886898/

I admit I haven't yet read it carefully. Did anyone have a look at it
already? One thing that came to my mind is that there might be a problem
with differences in the aromaticity detection between CACTVS and RDKit,
because the transforms partly depend on aromaticity flags. In addition, the
enumeration at some point would benefit from a solid way of identifying
generated duplicate molecules, so this might be another building block that
is missing.
You could just do a canonical smiles filter for that.
That is for sure the straightforward way, and also how I do duplicate filtering at the moment. I was just wondering how reliable it is because I thought I remember you mentioning there are cases where this strategy fails. But if it is only in rare cases (that's at least my experience so far) then it should be fine.

Any other thoughts on why a re-implemtation in RDKit might be problematic?
I read the paper when it came out and don't remember seeing any
absolute blockers. When I skimmed it yesterday that impression held: I
think it should be possible

It will be some work for sure.

that's certainly true.

Note: another method for tautomer canonicalization (but not
enumeration) is to convert to inchi and back. This is similar to
Noel's "canonical smiles using inchi" idea. The approach may be
somewhat fragile (I'm not convinced that the RDKit's inchi->molecule
implementation is the best), but is worth considering.
Thanks for this hint. I had already tried it out. It works to a certain extend, but InChIs only normalize some cases of tautomerism.

Here is an example where it fails:

A = CN4CCN(c2nc1cc(Cl)ccc1[nH]c3ccccc23)CC4
B = CN4CCN(c2[nH]c1cc(Cl)ccc1nc3ccccc23)CC4

InChI=1S/C18H19ClN4/c1-22-8-10-23(11-9-22)18-14-4-2-3-5-15(14)20-16-7-6-13(19)12-17(16)21-18/h2-7,12,20H,8-11H2,1H3
InChI=1S/C18H19ClN4/c1-22-8-10-23(11-9-22)18-14-4-2-3-5-15(14)20-16-7-6-13(19)12-17(16)21-18/h2-7,12,21H,8-11H2,1H3

Same when options -KET and -15T are enabled:

InChI=1/C18H19ClN4/c1-22-8-10-23(11-9-22)18-14-4-2-3-5-15(14)20-16-7-6-13(19)12-17(16)21-18/h2-7,12,20H,8-11H2,1H3
InChI=1/C18H19ClN4/c1-22-8-10-23(11-9-22)18-14-4-2-3-5-15(14)20-16-7-6-13(19)12-17(16)21-18/h2-7,12,21H,8-11H2,1H3


In either case converting back from inchis to molecules yields the same tautomers as put in. That's why I started to think about implementing a more general routine. I haven't made up my mind yet whether I will start to implement something. But in case I have a presentable solution I will certainly let you know. If someone is currently working on it or would be interested in joining forces please let me know.

-greg

Best,
Markus

--
*Markus Hartenfeller*
Chemoinformatics Specialist
Molecular Health GmbH
Belfortstr. 2
69115 Heidelberg
Germany
Tel: +49 6221 43851 209
Fax: +49 6221 43851 100
Email: [email protected]
www.molecularhealth.com

----------------------------------------------------------
Molecular Health GmbH

Geschaeftsfuehrer: Dr. Stephan Brock/
Dr. Friedrich von Bohlen und Halbach

Sitz der Gesellschaft: Heidelberg
Handelsregister: Amtsgericht Mannheim - HRB 338037
----------------------------------------------------------
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to