If the Ching's structures are correct, then the first two patterns have two 
saturated 
carbons, while the third one doesn't (it happens to be naphtalene).

The original SMARTS pattern for the third building block enforces only a 
generic carbon 
with any kind of bond and it matches the target structure. Therefore, the 
pattern should 
be something like this:
   [c]~1~[c]~[c]~[c]~[c]~2~[c]~1~[c]~[c]~[c]~[c]~2

Overall, I agree with Craig, these problems are not very easy to solve with a 
general 
solution, but depending on the level of approximation, "tailored" approaches 
can help.

Finally, I just want to share also a useful link for building and visualizing 
SMARTS 
patterns. I recommend the SMARTSviewer website at Hamburg University:
   http://smartsview.zbh.uni-hamburg.de/

It helped me many times to figure out what was wrong with my patterns.
Not sure if there are better ones, but I find it extremely useful myself.

S



On 03/10/2015 11:57 AM, Craig James wrote:
> The problem you have is actually quite difficult to solve in general. The 
> example you gave
> illustrates why: it is "obvious" to a chemist's eye how the molecule was 
> built, but the
> knowledge that the chemist has is very difficult to capture in a C++ program.
>
> Here is a suggestion for an algorithm. Begin by finding all places where each 
> building
> block (SMARTS) matches the target, and record which atoms it matched. (Use an 
> OBBitVec
> bitmap vector to record this). Create a list of every SMARTS and every place 
> it matches.
> Note that each SMARTS may be on the list more than once, for example if it 
> matches in two
> different places. You need to ensure that each time you add a SMARTS to the 
> list, it is
> for a unique set of atoms.
>
> For your example, you would have discovered that A, B, and C match one time 
> each, so your
> list would be three items long.
>
> Next, you try all possible combinations of your SMARTS matches. In your 
> example you would try:
>
>    A
>    AB
>    ABC
>    B
>    BC
>    C
>
> For each combination, you would use a bitwise OR operation of the OBBitVec 
> bitmaps to
> detect whether the building blocks "cover" the entire molecule. You would 
> discover that
> combinations AB and ABC both cover all atoms of the target molecule. Since AB 
> is simpler
> than ABC, you would reject ABC.
>
> Craig
>
>
> On Tue, Mar 10, 2015 at 11:31 AM, Ching Yen Shih <ching...@buffalo.edu
> <mailto:ching...@buffalo.edu>> wrote:
>
>     Thanks Noel! Let me restate my problem and make it more clear with an 
> example.
>     I have three building blocks, which are
>
>     <A>: C1C=CCC2=CSC=C12
>     Inline image 1
>     <B>:C1C=CCC2=COC=C12
>     Inline image 2
>     <C>: C1=CC2=C(C=C1)C=CC=C2
>     Inline image 3
>     Here I got a target molecule only with its SMILES 
> "C1C=C2C(C=CC3=COC=C23)C2=CSC=C12".
>
>     To figure out how this target molecule was built.
>       I ran my program of substructure searching with SMARTS strings .
>     <A>: [#6]~1~[#6]~[#6]~[#6]~[#6]~2~[#6]~1~[#6]~s~[#6]~2
>     <B>: [#6]~1~[#6]~[#6]~[#6]~[#6]~2~[#6]~1~[#6]~o~[#6]~2
>     <C>: [#6]~1~[#6]~[#6]~[#6]~[#6]~2~[#6]~1~[#6]~[#6]~[#6]~[#6]~2
>
>     I got the result that it contains " A=1, B=1, C=1". But. I only want the 
> least number
>     of building blocks that would make the whole target molecule.
>     In this case, according to the graph shown below, the result should be " 
> A= 1, B = 1,
>     C = 0", i.e. I only need A and B to make the above target molecule.
>
>     As it can be seen in the graph (shown below) of target molecule, it is 
> apparent that
>     it was built by A and B.
>
>     ​
>     So I am wondering how I can get the correct composition and further, the 
> sequence of
>     the building blocks of the target molecule by using SMARTS string of the 
> building blocks.
>
>     It's just a simplified instance of my real problem.
>     My real target molecules are built by 3~5 out of 30 building blocks.
>     The building blocks can be linked or fused together.
>
>
>     Thanks for your patience.
>
>     Best,
>     Ching-Yen
>
>
>
>     ​
>
>     On Tue, Mar 10, 2015 at 5:04 AM, Noel O'Boyle <baoille...@gmail.com
>     <mailto:baoille...@gmail.com>> wrote:
>
>         I think you need to think about solving your original problem in a
>         different way. If you can post an example of the sort of molecule you
>         are talking about and what the solution you would like to see, then
>         maybe you will get some useful suggestions from this list.
>
>         - Noel
>
>         On 6 March 2015 at 03:44, Ching Yen Shih <ching...@buffalo.edu
>         <mailto:ching...@buffalo.edu>> wrote:
>          > Hi all,
>          >
>          > I want to identify the sequence of substructures(building blocks) 
> in the
>          > target molecules. I already know which building blocks are in the 
> target
>          > molecules. The building blocks might be linked or fused. So, I am 
> wondering
>          > if there is a notation for fused bond. Say, I have a SMARTS string 
> for
>          > benzene, is it possible I just add a notation between two benzene 
> strings
>          > and use it for searching naphthalene?
>          > Or any other suggestion for this case about identifying the 
> sequence?
>          >
>          > Thanks for your attention and time.
>          >
>          > Best,
>          > Ching-Yen
>          >
>          >
>          > 
> ------------------------------------------------------------------------------
>          > Dive into the World of Parallel Programming The Go Parallel 
> Website,
>          > sponsored
>          > by Intel and developed in partnership with Slashdot Media, is your 
> hub for
>          > all
>          > things parallel software development, from weekly thought 
> leadership blogs
>          > to
>          > news, videos, case studies, tutorials and more. Take a look and 
> join the
>          > conversation now. http://goparallel.sourceforge.net/
>          > _______________________________________________
>          > OpenBabel-discuss mailing list
>          > OpenBabel-discuss@lists.sourceforge.net
>         <mailto:OpenBabel-discuss@lists.sourceforge.net>
>          > https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
>          >
>
>
>
>     
> ------------------------------------------------------------------------------
>     Dive into the World of Parallel Programming The Go Parallel Website, 
> sponsored
>     by Intel and developed in partnership with Slashdot Media, is your hub 
> for all
>     things parallel software development, from weekly thought leadership 
> blogs to
>     news, videos, case studies, tutorials and more. Take a look and join the
>     conversation now. http://goparallel.sourceforge.net/
>     _______________________________________________
>     OpenBabel-discuss mailing list
>     OpenBabel-discuss@lists.sourceforge.net 
> <mailto:OpenBabel-discuss@lists.sourceforge.net>
>     https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
>
>
>
>
> --
> ---------------------------------
> Craig A. James
> Chief Technology Officer
> eMolecules, Inc.
> ---------------------------------

-- 
  Stefano Forli, PhD

  Staff Scientist
  Molecular Graphics Laboratory
  Dept. of Integrative Structural
   and Computational Biology, MB-112F
  The Scripps Research Institute
  10550  North Torrey Pines Road
  La Jolla,  CA 92037-1000,  USA.

     tel: +1 (858)784-2055
     fax: +1 (858)784-2860
     email: fo...@scripps.edu
     http://www.scripps.edu/~forli/

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Reply via email to