If the Ching's structures are correct, then the first two patterns have two saturated carbons, while the third one doesn't (it happens to be naphtalene).
The original SMARTS pattern for the third building block enforces only a generic carbon with any kind of bond and it matches the target structure. Therefore, the pattern should be something like this: [c]~1~[c]~[c]~[c]~[c]~2~[c]~1~[c]~[c]~[c]~[c]~2 Overall, I agree with Craig, these problems are not very easy to solve with a general solution, but depending on the level of approximation, "tailored" approaches can help. Finally, I just want to share also a useful link for building and visualizing SMARTS patterns. I recommend the SMARTSviewer website at Hamburg University: http://smartsview.zbh.uni-hamburg.de/ It helped me many times to figure out what was wrong with my patterns. Not sure if there are better ones, but I find it extremely useful myself. S On 03/10/2015 11:57 AM, Craig James wrote: > The problem you have is actually quite difficult to solve in general. The > example you gave > illustrates why: it is "obvious" to a chemist's eye how the molecule was > built, but the > knowledge that the chemist has is very difficult to capture in a C++ program. > > Here is a suggestion for an algorithm. Begin by finding all places where each > building > block (SMARTS) matches the target, and record which atoms it matched. (Use an > OBBitVec > bitmap vector to record this). Create a list of every SMARTS and every place > it matches. > Note that each SMARTS may be on the list more than once, for example if it > matches in two > different places. You need to ensure that each time you add a SMARTS to the > list, it is > for a unique set of atoms. > > For your example, you would have discovered that A, B, and C match one time > each, so your > list would be three items long. > > Next, you try all possible combinations of your SMARTS matches. In your > example you would try: > > A > AB > ABC > B > BC > C > > For each combination, you would use a bitwise OR operation of the OBBitVec > bitmaps to > detect whether the building blocks "cover" the entire molecule. You would > discover that > combinations AB and ABC both cover all atoms of the target molecule. Since AB > is simpler > than ABC, you would reject ABC. > > Craig > > > On Tue, Mar 10, 2015 at 11:31 AM, Ching Yen Shih <ching...@buffalo.edu > <mailto:ching...@buffalo.edu>> wrote: > > Thanks Noel! Let me restate my problem and make it more clear with an > example. > I have three building blocks, which are > > <A>: C1C=CCC2=CSC=C12 > Inline image 1 > <B>:C1C=CCC2=COC=C12 > Inline image 2 > <C>: C1=CC2=C(C=C1)C=CC=C2 > Inline image 3 > Here I got a target molecule only with its SMILES > "C1C=C2C(C=CC3=COC=C23)C2=CSC=C12". > > To figure out how this target molecule was built. > I ran my program of substructure searching with SMARTS strings . > <A>: [#6]~1~[#6]~[#6]~[#6]~[#6]~2~[#6]~1~[#6]~s~[#6]~2 > <B>: [#6]~1~[#6]~[#6]~[#6]~[#6]~2~[#6]~1~[#6]~o~[#6]~2 > <C>: [#6]~1~[#6]~[#6]~[#6]~[#6]~2~[#6]~1~[#6]~[#6]~[#6]~[#6]~2 > > I got the result that it contains " A=1, B=1, C=1". But. I only want the > least number > of building blocks that would make the whole target molecule. > In this case, according to the graph shown below, the result should be " > A= 1, B = 1, > C = 0", i.e. I only need A and B to make the above target molecule. > > As it can be seen in the graph (shown below) of target molecule, it is > apparent that > it was built by A and B. > > > So I am wondering how I can get the correct composition and further, the > sequence of > the building blocks of the target molecule by using SMARTS string of the > building blocks. > > It's just a simplified instance of my real problem. > My real target molecules are built by 3~5 out of 30 building blocks. > The building blocks can be linked or fused together. > > > Thanks for your patience. > > Best, > Ching-Yen > > > > > > On Tue, Mar 10, 2015 at 5:04 AM, Noel O'Boyle <baoille...@gmail.com > <mailto:baoille...@gmail.com>> wrote: > > I think you need to think about solving your original problem in a > different way. If you can post an example of the sort of molecule you > are talking about and what the solution you would like to see, then > maybe you will get some useful suggestions from this list. > > - Noel > > On 6 March 2015 at 03:44, Ching Yen Shih <ching...@buffalo.edu > <mailto:ching...@buffalo.edu>> wrote: > > Hi all, > > > > I want to identify the sequence of substructures(building blocks) > in the > > target molecules. I already know which building blocks are in the > target > > molecules. The building blocks might be linked or fused. So, I am > wondering > > if there is a notation for fused bond. Say, I have a SMARTS string > for > > benzene, is it possible I just add a notation between two benzene > strings > > and use it for searching naphthalene? > > Or any other suggestion for this case about identifying the > sequence? > > > > Thanks for your attention and time. > > > > Best, > > Ching-Yen > > > > > > > ------------------------------------------------------------------------------ > > Dive into the World of Parallel Programming The Go Parallel > Website, > > sponsored > > by Intel and developed in partnership with Slashdot Media, is your > hub for > > all > > things parallel software development, from weekly thought > leadership blogs > > to > > news, videos, case studies, tutorials and more. Take a look and > join the > > conversation now. http://goparallel.sourceforge.net/ > > _______________________________________________ > > OpenBabel-discuss mailing list > > OpenBabel-discuss@lists.sourceforge.net > <mailto:OpenBabel-discuss@lists.sourceforge.net> > > https://lists.sourceforge.net/lists/listinfo/openbabel-discuss > > > > > > > ------------------------------------------------------------------------------ > Dive into the World of Parallel Programming The Go Parallel Website, > sponsored > by Intel and developed in partnership with Slashdot Media, is your hub > for all > things parallel software development, from weekly thought leadership > blogs to > news, videos, case studies, tutorials and more. Take a look and join the > conversation now. http://goparallel.sourceforge.net/ > _______________________________________________ > OpenBabel-discuss mailing list > OpenBabel-discuss@lists.sourceforge.net > <mailto:OpenBabel-discuss@lists.sourceforge.net> > https://lists.sourceforge.net/lists/listinfo/openbabel-discuss > > > > > -- > --------------------------------- > Craig A. James > Chief Technology Officer > eMolecules, Inc. > --------------------------------- -- Stefano Forli, PhD Staff Scientist Molecular Graphics Laboratory Dept. of Integrative Structural and Computational Biology, MB-112F The Scripps Research Institute 10550 North Torrey Pines Road La Jolla, CA 92037-1000, USA. tel: +1 (858)784-2055 fax: +1 (858)784-2860 email: fo...@scripps.edu http://www.scripps.edu/~forli/ ------------------------------------------------------------------------------ Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _______________________________________________ OpenBabel-discuss mailing list OpenBabel-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-discuss