Hi Nick,
On Thu, May 1, 2014 at 2:07 PM, Nicholas Firth <[email protected]>wrote:
>
> I have (yet another) question about the handling of SMARTS. I have a set
> of SMARTS (http://www.macinchem.org/reviews/pains/painsFilter.php) which
> I have been using to perform PAINS filters but I've just discovered some
> strange behaviour, I would expect a match to happen in the example below.
>
> >>> p = Chem.MolFromSmarts('[#6]-[#6](=[#16])-[#1]')
> >>> m = Chem.MolFromSmiles('CC=S')
> >>> m.HasSubstructMatch(p)
> False
>
That SMARTS, as formulated and being parsed is looking for an H atom that
is explicitly present in the molecule graph. Unless you're planning on
running AddHs() on everything, it won't work.
You can get the behavior you expect using the mergeHs argument to
MolFromSmarts():
In [3]: p = Chem.MolFromSmarts('[#6]-[#6](=[#16])-[#1]',mergeHs=True)
In [4]: m = Chem.MolFromSmiles('CC=S')
In [5]: m.HasSubstructMatch(p)
Out[5]: True
This removes the [#1] from the graph and adds a [!H0] query to the attached
carbon:
In [6]: Chem.MolToSmarts(p)
Out[6]: '[#6]-[#6&!H0]=[#16]'
This can be fixed using the alternative form of the SMARTS
>
> >>> p2 = Chem.MolFromSmarts('[#6]-[#6H](=[#16])')
> >>> m.HasSubstructMatch(p2)
> True
>
This is roughly equivalent to what happens above. The difference is that
it's querying for a C that has a single H attached. In this case that's
fine.
-greg
------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos. Get
unparalleled scalability from the best Selenium testing platform available.
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss