Hi Simon,

On Tue, Sep 27, 2011 at 9:21 AM, Simon Saubern <[email protected]> wrote:
>
> The recent updates to the way explicit hydrogens are handled in the RDKit
> nodes for KNIME  http://goo.gl/DK0FS have dramatically improved the number
> of correct matches that we observe when using the PAINS filters workflow
> http://goo.gl/T9mT2 .
>
> Against the reference set from WEHI, we're now seeing 652 matches (up from
> 329), but we also now get 231 false positives where we were getting none
> before.
>
> Attached is a tab-sep file containing the mis-matches (regID, smiles,
> smarts, smartsID).
>
> The smarts strings come from Raj's blog: http://blog.rguha.net/?p=850.
>
> Let us know if you need additional info to diagnose what's going on.

Thanks for providing all the data; that really helps.

I think I've got at least part of it figured out and fixed. There was
a problem with the way explicit Hs were being merged into the atoms
they are connected to. This led to bits of query like "C([#1])[#1]"
being converted to "[C&!H0]". This has been fixed in the RDKit itself.
I also updated the relevant pieces of the Knime nodes, the changes
should be in today's nightly build.

Please give the new version a try and let us know if there are still problems,
-greg

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to