The problem you have is actually quite difficult to solve in general. The
example you gave illustrates why: it is "obvious" to a chemist's eye how
the molecule was built, but the knowledge that the chemist has is very
difficult to capture in a C++ program.
Here is a suggestion for an algorithm. Begin by finding all places where
each building block (SMARTS) matches the target, and record which atoms it
matched. (Use an OBBitVec bitmap vector to record this). Create a list of
every SMARTS and every place it matches. Note that each SMARTS may be on
the list more than once, for example if it matches in two different places.
You need to ensure that each time you add a SMARTS to the list, it is for a
unique set of atoms.
For your example, you would have discovered that A, B, and C match one time
each, so your list would be three items long.
Next, you try all possible combinations of your SMARTS matches. In your
example you would try:
A
AB
ABC
B
BC
C
For each combination, you would use a bitwise OR operation of the OBBitVec
bitmaps to detect whether the building blocks "cover" the entire molecule.
You would discover that combinations AB and ABC both cover all atoms of the
target molecule. Since AB is simpler than ABC, you would reject ABC.
Craig
On Tue, Mar 10, 2015 at 11:31 AM, Ching Yen Shih <ching...@buffalo.edu>
wrote:
> Thanks Noel! Let me restate my problem and make it more clear with an
> example.
> I have three building blocks, which are
>
> <A>: C1C=CCC2=CSC=C12
> [image: Inline image 1]
> <B>:C1C=CCC2=COC=C12
> [image: Inline image 2]
> <C>: C1=CC2=C(C=C1)C=CC=C2
> [image: Inline image 3]
> Here I got a target molecule only with its SMILES
> "C1C=C2C(C=CC3=COC=C23)C2=CSC=C12".
>
> To figure out how this target molecule was built.
> I ran my program of substructure searching with SMARTS strings .
> <A>: [#6]~1~[#6]~[#6]~[#6]~[#6]~2~[#6]~1~[#6]~s~[#6]~2
> <B>: [#6]~1~[#6]~[#6]~[#6]~[#6]~2~[#6]~1~[#6]~o~[#6]~2
> <C>: [#6]~1~[#6]~[#6]~[#6]~[#6]~2~[#6]~1~[#6]~[#6]~[#6]~[#6]~2
>
> I got the result that it contains " A=1, B=1, C=1". But. I only want the
> least number of building blocks that would make the whole target molecule.
> In this case, according to the graph shown below, the result should be "
> A= 1, B = 1, C = 0", i.e. I only need A and B to make the above target
> molecule.
>
> As it can be seen in the graph (shown below) of target molecule, it is
> apparent that it was built by A and B.
>
>
> So I am wondering how I can get the correct composition and further, the
> sequence of the building blocks of the target molecule by using SMARTS
> string of the building blocks.
>
> It's just a simplified instance of my real problem.
> My real target molecules are built by 3~5 out of 30 building blocks.
> The building blocks can be linked or fused together.
>
>
> Thanks for your patience.
>
> Best,
> Ching-Yen
>
>
>
>
>
>
> On Tue, Mar 10, 2015 at 5:04 AM, Noel O'Boyle <baoille...@gmail.com>
> wrote:
>
>> I think you need to think about solving your original problem in a
>> different way. If you can post an example of the sort of molecule you
>> are talking about and what the solution you would like to see, then
>> maybe you will get some useful suggestions from this list.
>>
>> - Noel
>>
>> On 6 March 2015 at 03:44, Ching Yen Shih <ching...@buffalo.edu> wrote:
>> > Hi all,
>> >
>> > I want to identify the sequence of substructures(building blocks) in the
>> > target molecules. I already know which building blocks are in the target
>> > molecules. The building blocks might be linked or fused. So, I am
>> wondering
>> > if there is a notation for fused bond. Say, I have a SMARTS string for
>> > benzene, is it possible I just add a notation between two benzene
>> strings
>> > and use it for searching naphthalene?
>> > Or any other suggestion for this case about identifying the sequence?
>> >
>> > Thanks for your attention and time.
>> >
>> > Best,
>> > Ching-Yen
>> >
>> >
>> >
>> ------------------------------------------------------------------------------
>> > Dive into the World of Parallel Programming The Go Parallel Website,
>> > sponsored
>> > by Intel and developed in partnership with Slashdot Media, is your hub
>> for
>> > all
>> > things parallel software development, from weekly thought leadership
>> blogs
>> > to
>> > news, videos, case studies, tutorials and more. Take a look and join the
>> > conversation now. http://goparallel.sourceforge.net/
>> > _______________________________________________
>> > OpenBabel-discuss mailing list
>> > OpenBabel-discuss@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
>> >
>>
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for
> all
> things parallel software development, from weekly thought leadership blogs
> to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> OpenBabel-discuss mailing list
> OpenBabel-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
>
>
--
---------------------------------
Craig A. James
Chief Technology Officer
eMolecules, Inc.
---------------------------------
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss