Sorry: typo "...quickly eliminate some duplicates using non-canonical
SMILES..."

On Wed, 7 Nov 2018 at 09:32, Noel O'Boyle <baoille...@gmail.com> wrote:

> I don't believe that there is a better way to eliminate duplicates apart
> from using canonical forms (e.g. InChI or canonical SMILES, though InChI
> also changes the structure so use with care). You can quickly eliminate
> some duplicates using canonical SMILES or a matrix comparison as you
> suggest, but to actually be sure you will still have to use a canonical
> form.
>
> I'm not sure exactly what you're doing but another idea would be to avoid
> generating some structures in the first place using graph symmetry
> (available in OB). For example, don't apply the same reaction to symmetry
> equivalent atoms.
>
> - Noel
>
> On Fri, 2 Nov 2018 at 18:08, Xianghai Sheng <xsh...@ucmerced.edu> wrote:
>
>> Hi Andrew and Noel,
>>
>> Thank you for the inspiring answers!
>>
>> Andrew: Sorry I didn't make it clear, but 1/2 of the time is only for a
>> random job I tested and I don't know the scaling of it. Its cost may grow
>> faster than the searching algorithm. I think it is beneficial to know the
>> formal time complexity in this sense. But I agree that it is mostly useful
>> when thinking about large N. I may have to just measure the actual scaling
>> and deal with it.
>>
>> Noel: That's a brilliant idea! I do think most of the time is wasted on
>> canonicalizing the same graph in my code. I will give it a try. On another
>> note, the purpose of canonicalization in my code is to eliminate duplicates
>> in my reaction path network. Is there a better way to do it? Molecules in
>> my code are represented by bond-electron matrix. A naive approach would be
>> to sort rows in both matrices and compare them row by row. (I think this
>> takes O(N^2*logN))
>>
>> Best,
>> Xianghai
>>
>> On Fri, Nov 2, 2018 at 3:31 AM Noel O'Boyle <baoille...@gmail.com> wrote:
>>
>>> First of all, I'd recommend using the development code rather than the
>>> latest release. Also, ensure that hydrogens are suppressed. You may also
>>> consider trying to avoid multiple canonicalisations of the same graph by
>>> just writing out a non-canonical SMILES, and only at the end collating them
>>> and canonicalising.
>>>
>>> While I agree with Andrew, it is also true that Open Babel's
>>> canonicalisation implementation is slow. This is partly due to the fact
>>> that we calculate all of the graph invariants up-front rather than lazily
>>> to split ties. In particular, we calculate the pairwise distance of every
>>> atom from every other atom (as far as I remember). This is O(N**2).
>>>
>>>
>>> Regards,
>>> - Noel
>>>
>>> On Fri, 2 Nov 2018 at 08:02, Andrew Dalke <da...@dalkescientific.com>
>>> wrote:
>>>
>>>> On Nov 2, 2018, at 01:33, Xianghai Sheng <xsh...@ucmerced.edu> wrote:
>>>> > I am trying to figure out the time complexity (Big O) of converting
>>>> from OBMol to canonical smiles.
>>>>
>>>> I don't think that's a useful way to think about your problem.
>>>>
>>>> Graph canonicalization is "at least as computationally hard as the
>>>> graph isomorphism problem."
>>>> https://en.wikipedia.org/wiki/Graph_canonization , and graph
>>>> isomorphism is believed to be quasi-polynomial.
>>>>
>>>> That said, I believe the actual algorithm Open Babel uses may have
>>>> exponential time for some graphs.
>>>>
>>>> However, most molecular graphs don't reach the worst-time behavior.
>>>>
>>>> Furthermore, time complexity is mostly useful when thinking about large
>>>> N. Molecules typically have small N, and there may be lower-order terms
>>>> which dominate the times for small N.
>>>>
>>>> If you are interested in the topic of graph canonicalization algorithms
>>>> in general, you might start with the nauty papers; see
>>>> http://pallini.di.uniroma1.it/
>>>>
>>>> > I want to know this because getting canonical smiles takes up half of
>>>> the running time of my reaction path searching program.
>>>>
>>>> How would knowing the time complexity help you?
>>>>
>>>> While this may be obvious, if canonicalization takes 1/2 the time, then
>>>> you'll only get 2x speedup even if that time goes to 0. How much work are
>>>> you willing to do for a 2x speedup?
>>>>
>>>> Might it be possible to do about the same amount of work to parallelize
>>>> your algorithm? For example, if your reaction path search generates a set
>>>> of nodes which must all be canonicalized in order to evaluate them, then
>>>> you can do all of those canonicalizations in parallel.
>>>>
>>>> Cheers,
>>>>
>>>>                                 Andrew
>>>>                                 da...@dalkescientific.com
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> OpenBabel-discuss mailing list
>>>> OpenBabel-discuss@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
>>>>
>>>
_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Reply via email to