John Machin: >Perhaps I'm missing a posting of yours -- what are merge2 and merge4? What is "this random pairs test"? What is xMax (nPairs isn't hard to guess), and how are you generating your input data?<
merge2 and this random pairs test comes from the post by Brian Beck. merge4 is the first in my last post. And merge3 isn't included (nor tested) because it uses the original (slow) addArcs. This is the simple test generator taken from his post: xMax = 3000 nPairs = xMax*5 l = [[randint(0,xMax), randint(0,xMax)] for i in xrange(nPairs)] For such high number of nPairs (arcs), the resulting graph usuall has one big connected component, and few tiny debris... To produce a more divided graph, nPairs have to be much smaller, like xMax*1.5. >FWIW, my offering is attached. < I'm sorry, I don't see the attach... (just the different title). John Machin: >Only David's uses stuff that's not in the Python 2.4 off-the-shelf distribution.< Well, using already mede functions and classes is good, it avoids you to reinvent things each time. Some of my versions too use my graph library (for a problem like this I usually don't create stand alone versions like merge6 and merge7). And my original point was adding one graph data structure to the standard Python library :-] I haven't tested the speed of David Eppstein version... Anyway, this is a simplified version of merge6, that uses an indirect graph g, instead of the directed graph, and avoids the use of chain: . from collections import deque . . def merge7(l): . # Undirect graph g creation: . g = {} . for n1,n2 in l: . if n1 not in g: g[n1] = {n2:0} . else: g[n1][n2] = 0 . if n2 not in g: g[n2] = {n1:0} . else: g[n2][n1] = 0 . # Connected components: . result = [] . visited = set() . for root in g.iterkeys(): . if root not in visited: . component = [root] . visited.add(root) . Q = deque() # A queue . Q.append(root) . while Q: . n1 = Q.popleft() . for n2 in g[n1].iterkeys(): . if n2 not in visited: . visited.add(n2) . Q.append(n2) . component.append(n2) . result.append(component) . return result It's a bit shorter and a little faster than merge6, memory used is similar (there is only one dictionary, so there is only one ammortization overhead of the dictionary, but it contains the sum of both arcs, so the ammortization can be twice as big, so... memory used can be the same). (Usually BFS and DFS memory requirements are a little different; this implementation uses BFS, and I think it uses a bit less memory than DFS, but for this problem/test the difference isn't significative.) Bear hugs, Bearophile -- http://mail.python.org/mailman/listinfo/python-list