Interesting.. it took me a while to figure out why the second case is
so much slower and you're right, it is indeed quadratic. I don't know
how likely would such pathological cases be in practice, given that
the preferred way to merge a batch of objects is
eq.merge(*xrange(10001)), which is more than 3 times faster than the
non-pathologic first case (and which I optimized even further to avoid
attribute lookups within the loop so it's more like 5 times faster
now). Also the batch version in this case remains linear even if you
merge backwards, eq.merge(*xrange(10000,-1,-1)), or in any order for
that matter.

The example just showed what could happen if the merges are done in pathological order, it is not about batch merging. I think that pathological cases like this indeed show up in real cases: many algorithms of near duplicate elimination and clustering reduce to finding connected components of a graph whose edges are given as a stream, so you can't control their order. With this implementation, every time a component sized N is given a second (or following) argument to merge, you pay Omega(N).

I am familiar with it and I will certainly consider it for the next
version; for now I was primarily interested in functionality (API) and
correctness.

Good :) --
http://mail.python.org/mailman/listinfo/python-list

Reply via email to