Astley Le Jasper wrote: > I have a list of tuples that indicate a relationship, ie a is related > to b, b is related to c etc etc. What I want to do is cluster these > relationships into groups. An item will only be associated with a > single cluster. > > Before I started, I wondered if there was any particular tool within > Python I should be looking at. I don't expect anyone to code this for > me, just say ... "you need to look at using x". I was going to use > populate a dictionary and > > Sorry for being so vague. > > Example Data: > [(a,b) > (a,c) > (a,d) > (b,c) > (b,d) > (c,d) > (e,f) > (e,g) > (f,g) > (h,i)] > > Output (grouping id, item ref) > (1,a), > (1,b), > (1,c), > (1,d), > (2,e), > (2,f), > (2,g), > (3,h), > (3,i)
A straightforward implementation: $ cat group_edges.py def find_groups(edges): lookup = {} # node --> group groups = {} # id(group) --> group for a, b in edges: if a in lookup: if b in lookup: ga = lookup[a] gb = lookup[b] if ga is not gb: # merge groups if len(ga) < len(gb): # always update the smaller group ga, gb = gb, ga del groups[id(gb)] ga.update(gb) for k in gb: lookup[k] = ga else: # add b to a's group ga = lookup[a] ga.add(b) lookup[b] = ga elif b in lookup: # add a to b's group gb = lookup[b] gb.add(a) lookup[a] = gb else: # create new group g = set((a, b)) groups[id(g)] = g lookup[a] = lookup[b] = g return groups.values() def show_groups(edges): groups = sorted(sorted(g) for g in find_groups(edges)) for i, group in enumerate(groups, 1): for node in sorted(group): print i, node if __name__ == "__main__": edges = [ ('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd'), ('c', 'd'), ('e', 'f'), ('e', 'g'), ('f', 'g'), ('h', 'i'), ("lonesome john", "lonesome john")] show_groups(edges) print show_groups([('a', 'c'), ('b', 'd'), ('c', 'd')]) $ python group_edges.py 1 a 1 b 1 c 1 d 2 e 2 f 2 g 3 h 3 i 4 lonesome john 1 a 1 b 1 c 1 d Untested. Peter -- http://mail.python.org/mailman/listinfo/python-list