Re: ANN: equivalence 0.1

Giuseppe Ottaviano Mon, 02 Jun 2008 13:05:31 -0700


Interesting.. it took me a while to figure out why the second case is
so much slower and you're right, it is indeed quadratic. I don't know
how likely would such pathological cases be in practice, given that
the preferred way to merge a batch of objects is
eq.merge(*xrange(10001)), which is more than 3 times faster than the
non-pathologic first case (and which I optimized even further to avoid
attribute lookups within the loop so it's more like 5 times faster
now). Also the batch version in this case remains linear even if you
merge backwards, eq.merge(*xrange(10000,-1,-1)), or in any order for
that matter.

The example just showed what could happen if the merges are done inpathological order, it is not about batch merging. I think thatpathological cases like this indeed show up in real cases: manyalgorithms of near duplicate elimination and clustering reduce tofinding connected components of a graph whose edges are given as astream, so you can't control their order.With this implementation, every time a component sized N is given asecond (or following) argument to merge, you pay Omega(N).

I am familiar with it and I will certainly consider it for the next
version; for now I was primarily interested in functionality (API) and
correctness.

Good :)--

http://mail.python.org/mailman/listinfo/python-list

Re: ANN: equivalence 0.1

Reply via email to