Hi, During a fun coding session yesterday, I came across a problem that I thought was already solved by itertools, but on investigation it seems it isn't.
The problem is simple: given one or more ordered sequences, return only the objects that appear in each sequence, without reading the whole set into memory. This is basically an SQL many-many join. I thought it could be accomplished through recursively embedded generators, but that approach failed in the end. After posting the question to Stack Overflow[0], Martin Geisler proposed a wonderfully succinct and reusable solution (see below, or pretty printed at the Stack Overflow URL). It is my opinion that this particular implementation is a wonderful and incredibly valid use of iterators, and something that could be reused by others, certainly least not myself again in the future. With that in mind I thought it, or something very similar, would be a great addition to the itertools module. My question then is, are there better approaches to this? The heapq- based solution at the Stack Overflow page is potentially more useful still, for its ability to operate on orderless sequences, but in that case, it might be better to simply listify each sequence, and sort it before passing to the ordered-only functions. Thanks, David. Stack Overflow page here: http://stackoverflow.com/questions/969709/joining-a-set-of-ordered-integer-yielding-python-iterators Sweet solution: import operator def intersect(sequences): """Compute intersection of sequences of increasing integers. >>> list(intersect([[1, 100, 142, 322, 12312], ... [2, 100, 101, 322, 1221], ... [100, 142, 322, 956, 1222]])) [100, 322] """ iterators = [iter(seq) for seq in sequences] last = [iterator.next() for iterator in iterators] indices = range(len(iterators)) while True: # The while loop stops when StopIteration is raised. The # exception will also stop the iteration by our caller. if reduce(operator.and_, [l == last[0] for l in last]): # All iterators contain last[0] yield last[0] last = [iterator.next() for iterator in iterators] # Now go over the iterators once and advance them as # necessary. To stop as soon as the smallest iterator we # advance each iterator only once per loop iteration. for i in indices[:-1]: if last[i] < last[i+1]: last[i] = iterators[i].next() if last[i] > last[i+1]: last[i+1] = iterators[i+1].next() -- http://mail.python.org/mailman/listinfo/python-list