Hi RDKitters,
I have (yet another) question about the handling of SMARTS. I have a set of
SMARTS (http://www.macinchem.org/reviews/pains/painsFilter.php) which I have
been using to perform PAINS filters but I've just discovered some strange
behaviour, I would expect a match to happen in the example below.
>>> p = Chem.MolFromSmarts('[#6]-[#6](=[#16])-[#1]')
>>> m = Chem.MolFromSmiles('CC=S')
>>> m.HasSubstructMatch(p)
False
This can be fixed using the alternative form of the SMARTS
>>> p2 = Chem.MolFromSmarts('[#6]-[#6H](=[#16])')
>>> m.HasSubstructMatch(p2)
True
Doing some research (which I can no longer find the link for) it seems that
[#1] seems to be reserved for more 'interesting' cases of hydrogen, for example:
>>> m = Chem.MolFromSmiles('CC(=S)[H]')
>>> m.HasSubstructMatch(p)
False
>>> m = Chem.MolFromSmiles('CC(=S)[2H]')
>>> m.HasSubstructMatch(p)
True
Also this seems to be changing the examples which greg posted in
http://sourceforge.net/p/rdkit/mailman/message/31650578/
>>> p1=Chem.MolFromSmarts('c2sccc2[#1]')
>>> mol=Chem.MolFromSmiles('Clc2sccc2[H]')
>>> mol.HasSubstructMatch(p1)
False
Firstly is this expected behaviour? Because it's different to what I would
expect, and different to how Pipeline Pilot behaves with SMARTS matching. And
secondly, does anyone know how to get the expected behaviour without rewriting
all the SMARTS.?
Thanks in advance. Apologies for the long read.
Best,
Nick
Nicholas C. Firth | PhD Student | Cancer Therapeutics
The Institute of Cancer Research | 15 Cotswold Road | Belmont | Sutton | Surrey
| SM2 5NG
T 020 8722 4033 | E [email protected]<mailto:[email protected]> |
W www.icr.ac.uk<http://www.icr.ac.uk/> | Twitter
@ICRnews<https://twitter.com/ICRnews>
Facebook
www.facebook.com/theinstituteofcancerresearch<http://www.facebook.com/theinstituteofcancerresearch>
Making the discoveries that defeat cancer
[cid:[email protected]]
The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company
Limited by Guarantee, Registered in England under Company No. 534147 with its
Registered Office at 123 Old Brompton Road, London SW7 3RP.
This e-mail message is confidential and for use by the addressee only. If the
message is received by anyone other than the addressee, please return the
message to the sender by replying to it and then delete the message from your
computer and network.------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos. Get
unparalleled scalability from the best Selenium testing platform available.
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss