On Mon, Jun 2, 2008 at 6:31 AM, Giuseppe Ottaviano <[EMAIL PROTECTED]> wrote: > > On Jun 1, 2008, at 6:16 PM, George Sakkis wrote: > >> Equivalence is a class that can be used to maintain a partition of >> objects into equivalence sets, making sure that the equivalence >> properties (reflexivity, symmetry, transitivity) are preserved. Two >> objects x and y are considered equivalent either implicitly (through a >> key function) or explicitly by calling merge(x,y). > > I think this library would be very useful, and I like the interface (in > particular the idea of the projecting function),
Thanks for trying it out! > but I think the algorithm > is less than optimal. It can show quadratic behavior in some cases: > > $ python -m timeit -s 'import equivalence' -s 'eq = > equivalence.Equivalence()' 'for i in xrange(10000): eq.merge(i, i+1)' > 10 loops, best of 3: 57.6 msec per loop > > $ python -m timeit -s 'import equivalence' -s 'eq = > equivalence.Equivalence()' 'for i in xrange(10000): eq.merge(i+1, i)' > 10 loops, best of 3: 2.59 sec per loop Interesting.. it took me a while to figure out why the second case is so much slower and you're right, it is indeed quadratic. I don't know how likely would such pathological cases be in practice, given that the preferred way to merge a batch of objects is eq.merge(*xrange(10001)), which is more than 3 times faster than the non-pathologic first case (and which I optimized even further to avoid attribute lookups within the loop so it's more like 5 times faster now). Also the batch version in this case remains linear even if you merge backwards, eq.merge(*xrange(10000,-1,-1)), or in any order for that matter. > Have you considered using the Union-Find algorithm, which would be (almost) > linear in all cases? I am familiar with it and I will certainly consider it for the next version; for now I was primarily interested in functionality (API) and correctness. Thanks, George -- http://mail.python.org/mailman/listinfo/python-list