On Jun 10, 11:24 pm, David Wilson <d...@botanicus.net> wrote: > Hi, > > During a fun coding session yesterday, I came across a problem that I > thought was already solved by itertools, but on investigation it seems > it isn't. > > The problem is simple: given one or more ordered sequences, return > only the objects that appear in each sequence, without reading the > whole set into memory. This is basically an SQL many-many join. > > I thought it could be accomplished through recursively embedded > generators, but that approach failed in the end. After posting the > question to Stack Overflow[0], Martin Geisler proposed a wonderfully > succinct and reusable solution (see below, or pretty printed at the > Stack Overflow URL). > > It is my opinion that this particular implementation is a wonderful > and incredibly valid use of iterators, and something that could be > reused by others, certainly least not myself again in the future. With > that in mind I thought it, or something very similar, would be a great > addition to the itertools module. > > My question then is, are there better approaches to this? The heapq- > based solution at the Stack Overflow page is potentially more useful > still, for its ability to operate on orderless sequences, but in that > case, it might be better to simply listify each sequence, and sort it > before passing to the ordered-only functions. > > Thanks, > > David. > > Stack Overflow page here: > > http://stackoverflow.com/questions/969709/joining-a-set-of-ordered-in... > > Sweet solution: > > import operator > > def intersect(sequences): > """Compute intersection of sequences of increasing integers. > > >>> list(intersect([[1, 100, 142, 322, 12312], > ... [2, 100, 101, 322, 1221], > ... [100, 142, 322, 956, 1222]])) > [100, 322] > > """ > iterators = [iter(seq) for seq in sequences] > last = [iterator.next() for iterator in iterators] > indices = range(len(iterators)) > while True: > # The while loop stops when StopIteration is raised. The > # exception will also stop the iteration by our caller. > if reduce(operator.and_, [l == last[0] for l in last]): > # All iterators contain last[0] > yield last[0] > last = [iterator.next() for iterator in iterators] > > # Now go over the iterators once and advance them as > # necessary. To stop as soon as the smallest iterator we > # advance each iterator only once per loop iteration. > for i in indices[:-1]: > if last[i] < last[i+1]: > last[i] = iterators[i].next() > if last[i] > last[i+1]: > last[i+1] = iterators[i+1].next()
I found my answer: Python 2.6 introduces heap.merge(), which is designed exactly for this. Thanks all, David. -- http://mail.python.org/mailman/listinfo/python-list