Thanks for your reply, Greg.
On 04/13/2013 08:01 AM, Greg Landrum wrote:
Hi Markus,
On Fri, Apr 12, 2013 at 9:30 AM, Markus Hartenfeller
<[email protected]> wrote:
I wanted to ask if anybody is currently working on an implementation of a
tautomer enumeration (and canonicalization) in RDKit.
I'm not, but definitely think it would be interesting.
This JCAMD paper from Sitzmann& Ihlenfeldt from 2010 describes a in
implementation that looks fairly solid at a first glance, plus: they
published the rSMARTS transformations.
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2886898/
I admit I haven't yet read it carefully. Did anyone have a look at it
already? One thing that came to my mind is that there might be a problem
with differences in the aromaticity detection between CACTVS and RDKit,
because the transforms partly depend on aromaticity flags. In addition, the
enumeration at some point would benefit from a solid way of identifying
generated duplicate molecules, so this might be another building block that
is missing.
You could just do a canonical smiles filter for that.
That is for sure the straightforward way, and also how I do duplicate
filtering at the moment. I was just wondering how reliable it is because
I thought I remember you mentioning there are cases where this strategy
fails. But if it is only in rare cases (that's at least my experience so
far) then it should be fine.
Any other thoughts on why a re-implemtation in RDKit might be problematic?
I read the paper when it came out and don't remember seeing any
absolute blockers. When I skimmed it yesterday that impression held: I
think it should be possible
It will be some work for sure.
that's certainly true.
Note: another method for tautomer canonicalization (but not
enumeration) is to convert to inchi and back. This is similar to
Noel's "canonical smiles using inchi" idea. The approach may be
somewhat fragile (I'm not convinced that the RDKit's inchi->molecule
implementation is the best), but is worth considering.
Thanks for this hint. I had already tried it out. It works to a certain
extend, but InChIs only normalize some cases of tautomerism.
Here is an example where it fails:
A = CN4CCN(c2nc1cc(Cl)ccc1[nH]c3ccccc23)CC4
B = CN4CCN(c2[nH]c1cc(Cl)ccc1nc3ccccc23)CC4
InChI=1S/C18H19ClN4/c1-22-8-10-23(11-9-22)18-14-4-2-3-5-15(14)20-16-7-6-13(19)12-17(16)21-18/h2-7,12,20H,8-11H2,1H3
InChI=1S/C18H19ClN4/c1-22-8-10-23(11-9-22)18-14-4-2-3-5-15(14)20-16-7-6-13(19)12-17(16)21-18/h2-7,12,21H,8-11H2,1H3
Same when options -KET and -15T are enabled:
InChI=1/C18H19ClN4/c1-22-8-10-23(11-9-22)18-14-4-2-3-5-15(14)20-16-7-6-13(19)12-17(16)21-18/h2-7,12,20H,8-11H2,1H3
InChI=1/C18H19ClN4/c1-22-8-10-23(11-9-22)18-14-4-2-3-5-15(14)20-16-7-6-13(19)12-17(16)21-18/h2-7,12,21H,8-11H2,1H3
In either case converting back from inchis to molecules yields the same
tautomers as put in. That's why I started to think about implementing a
more general routine. I haven't made up my mind yet whether I will start
to implement something. But in case I have a presentable solution I will
certainly let you know. If someone is currently working on it or would
be interested in joining forces please let me know.
-greg
Best,
Markus
--
*Markus Hartenfeller*
Chemoinformatics Specialist
Molecular Health GmbH
Belfortstr. 2
69115 Heidelberg
Germany
Tel: +49 6221 43851 209
Fax: +49 6221 43851 100
Email: [email protected]
www.molecularhealth.com
----------------------------------------------------------
Molecular Health GmbH
Geschaeftsfuehrer: Dr. Stephan Brock/
Dr. Friedrich von Bohlen und Halbach
Sitz der Gesellschaft: Heidelberg
Handelsregister: Amtsgericht Mannheim - HRB 338037
----------------------------------------------------------
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss