Tim Chase wrote: > I've got several iterators sharing a common key in the same order and > would like to iterate over them in parallel, operating on all items > with the same key. I've simplified the data a bit here, but it would > be something like > > data1 = [ # key, data1 > (1, "one A"), > (1, "one B"), > (2, "two"), > (5, "five"), > ] > > data2 = [ # key, data1 > (1, "uno"), > (2, "dos"), > (3, "tres x"), > (3, "tres y"), > (3, "tres z"), > (4, "cuatro"), > ] > > data3 = [ # key, data1, data2 > (2, "ii", "extra alpha"), > (4, "iv", "extra beta"), > (5, "v", "extra gamma"), > ] > > And I'd like to do something like > > for common_key, d1, d2, d3 in magic_happens_here(data1, data2, data3): > for row in d1: > process_a(common_key, row) > for thing in d2: > process_b(common_key, row) > for thing in d3: > process_c(common_key, row) > > which would yield the common_key, along with enough of each of those > iterators (note that gaps can happen, but the sortable order should > remain the same). So in the above data, the outer FOR loop would > happen 5 times with common_key being [1, 2, 3, 4, 5], and each of > [d1, d2, d3] being an iterator that deals with just that data. > > My original method was hauling everything into memory and making > multiple passes filtering on the data. However, the actual sources > are CSV-files, some of which are hundreds of megs in size, and my > system was taking a bit of a hit. So I was hoping for a way to do > this with each iterator making only one complete pass through each > source (since they're sorted by common key). > > It's somewhat similar to the *nix "join" command, only dealing with > N files. > > Thanks for any hints. > > -tkc
A bit messy, might try replacing groups list with a dict: $ cat merge.py from itertools import groupby from operator import itemgetter first = itemgetter(0) rest = itemgetter(slice(1, None)) def magic_happens_here(*columns): grouped = [groupby(column, key=first) for column in columns] missing = object() def getnext(g): nonlocal n try: k, g = next(g) except StopIteration: n -= 1 return (missing, None) return k, g n = len(grouped) groups = [getnext(g) for g in grouped] while n: minkey = min(k for k, g in groups if k is not missing) yield (minkey,) + tuple( map(rest, g) if k == minkey else () for k, g in groups) for i, (k, g) in enumerate(groups): if k == minkey: groups[i] = getnext(grouped[i]) if __name__ == "__main__": data1 = [ # key, data1 (1, "one A"), (1, "one B"), (2, "two"), (5, "five"), ] data2 = [ # key, data1 (1, "uno"), (2, "dos"), (3, "tres x"), (3, "tres y"), (3, "tres z"), (4, "cuatro"), ] data3 = [ # key, data1, data2 (2, "ii", "extra alpha"), (4, "iv", "extra beta"), (5, "v", "extra gamma"), ] for common_key, d1, d2, d3 in magic_happens_here(data1, data2, data3): print(common_key) for d in d1, d2, d3: print(" ", list(d)) $ python3 merge.py 1 [('one A',), ('one B',)] [('uno',)] [] 2 [('two',)] [('dos',)] [('ii', 'extra alpha')] 3 [] [('tres x',), ('tres y',), ('tres z',)] [] 4 [] [('cuatro',)] [('iv', 'extra beta')] 5 [('five',)] [] [('v', 'extra gamma')] -- https://mail.python.org/mailman/listinfo/python-list