John Machin>FWIW, here's a brief UAT report: Here is something else for you. Note: for more correct comparisons, for the following tests I've disabled the use of Psyco in Graph (usually enabled, if present). This looks faster than merge2 for this (quite sparse) random pairs test:
. import graph # http://www.fantascienza.net/animalia/graph.zip . def merge4(l): . g = graph.Graph() . for n1,n2 in l: . g.addArc(n1,n2) . return g.connectedComponents() Creating a better addArcs produces good results (addArcs method is present in Graph, but addArcs2 isn't present): . def merge5(l): . g = graph.Graph() . g.addArcs2(l) . return g.connectedComponents() If you like to see the actual code, without using Graph, I've made a very stripped down version of addArcs2 and connectedComponents: . from collections import deque . from itertools import chain . . def merge6(l): . # graph creation . gi = {} . go = {} . for n1,n2 in l: . if n1 not in go: . go[n1] = {} . gi[n1] = {} . if n2 not in go: . go[n2] = {} . gi[n2] = {} . go[n1][n2] = 0 . gi[n2][n1] = 0 . # Connected components: . result = [] . visited = set() . for root in go.iterkeys(): . if root not in visited: . component = [root] . visited.add(root) . Q = deque() # A queue . Q.append(root) . while Q: . n1 = Q.popleft() . for n2 in chain( go[n1].iterkeys(), gi[n1].iterkeys() ): . if n2 not in visited: . visited.add(n2) . Q.append(n2) . component.append(n2) . result.append(component) . return result At the end of the tests I've added this to be sure the results are always the same: r2 = set( frozenset(e) for e in merge2(l) ) r4 = set( frozenset(e) for e in merge4(l) ) r5 = set( frozenset(e) for e in merge5(l) ) r6 = set( frozenset(e) for e in merge6(l) ) assert r2 == r4 assert r2 == r6 For xMax, nPairs = 1000, 5000: merge2: 15750 function calls in 0.939 CPU seconds merge4: 11029 function calls in 0.560 CPU seconds merge5: 6030 function calls in 0.303 CPU seconds merge6: 6011 function calls in 0.297 CPU seconds For xMax, nPairs = 2000, 10000: merge2: 31503 function calls in 2.505 CPU seconds merge6: 12011 function calls in 0.639 CPU seconds Timings using clock() (much more reliable than the profilers!), for xMax, nPairs = 2000, 10000: merge2: 1.222 merge6: 0.201 Probably merge6 can be improved, reducing memory used... Bear hugs, Bearophile -- http://mail.python.org/mailman/listinfo/python-list