Thanks Greg,

The final strange behaviour I've noticed that could trip fellow users up is
with matching kekule verses aromatic representations of the same molecule
in SMARTS against SMILES. Most surprisingly C1=CC=CC=C1 is not a
substructure of itself but has c1ccccc1 as a substructure (if the lefthand
term is SMILES and the right is SMARTS in both cases).
Code to demonstrate what I mean below:

> aromatic_benzene_smiles = Chem.MolFromSmiles('c1ccccc1')
> aromatic_benzene_smarts = Chem.MolFromSmarts('c1ccccc1')
> kekule_benzene_smiles = Chem.MolFromSmiles('C1=CC=CC=C1')
> kekule_benzene_smarts = Chem.MolFromSmarts('C1=CC=CC=C1')
> aromatic_benzene_smiles.HasSubstructMatch(aromatic_benzene_smarts)
True
> aromatic_benzene_smiles.HasSubstructMatch(kekule_benzene_smiles)
True
> aromatic_benzene_smiles.HasSubstructMatch(kekule_benzene_smarts)
False
> kekule_benzene_smiles.HasSubstructMatch(kekule_benzene_smarts)
False
> kekule_benzene_smiles.HasSubstructMatch(aromatic_benzene_smiles)
True
> kekule_benzene_smiles.HasSubstructMatch(aromatic_benzene_smarts)
True

I think I can see why there is a difference in behaviour, a double bond is
not the same thing as an aromatic bond. In the SMILES case a conversion can
take place because the context is complete but in the SMARTS case it is not
(or at least might not be). But I thought I'd point out the issue in any
case. The workaround is to always explicitly make atoms aromatic in SMARTS
if you wish them to match aromatic SMILES rather than relying on the kekule
representation to sort it for you.

Yours,

Toby Wright

--
InhibOx Ltd


On 6 March 2014 04:55, Greg Landrum <[email protected]> wrote:

>
>
> On Wed, Mar 5, 2014 at 4:03 PM, Toby Wright <[email protected]>wrote:
>
>>
>> This is probably related to the above so I thought I'd post it on this
>> thread. I am noticing inconsistent behaviour when a molecule created via
>> SMARTS that contains an 'or' statement has HasSubstructMatch called on it,
>> as opposed to it being the argument to HasSubstructMatch. A simple example
>> follows:
>>
>> > O_or_C = Chem.MolFromSmarts('[O,C]')
>> > O = Chem.MolFromSmiles('O')
>> > C = Chem.MolFromSmiles('C')
>> > O_or_C.HasSubstructMatch(O)
>> True
>> > O_or_C.HasSubstructMatch(C)
>> False
>> > O.HasSubstructMatch(O_or_C)
>> True
>> > C.HasSubstructMatch(O_or_C)
>> True
>>
>> We also see:
>> > C_or_O = Chem.MolFromSmarts('[C,O]')
>> > C_or_O.HasSubstructMatch(O)
>> False
>> > C_or_O.HasSubstructMatch(C)
>> True
>>
>> so the order of elements in a SMARTS 'or' statement changes the
>> behaviour, which is unexpected.
>>
>
> This is indeed related. This is a case I didn't cover above: the
> SMILES/SMARTS match. The behavior above is expected from the point of view
> of what's in the code, though I can understand how it may not make much
> sense from the perspective of someone using the code. :-) The above should
> probably return False in both cases.
>
> In general, one should probably expect that using the HasSubstructMatch()
> method of a molecule constructed from SMARTS is likely to produce "strange"
> results. Getting a general purpose query--query matcher to work is, as far
> as I can tell, a decidedly non-trivial problem.
>
> -greg
>
>
------------------------------------------------------------------------------
Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works. 
Faster operations. Version large binaries.  Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to