On Jun 11, 12:59 am, Jack Diederich <jackd...@gmail.com> wrote: > On Wed, Jun 10, 2009 at 6:24 PM, David Wilson<d...@botanicus.net> wrote: > > During a fun coding session yesterday, I came across a problem that I > > thought was already solved by itertools, but on investigation it seems > > it isn't. > > > The problem is simple: given one or more ordered sequences, return > > only the objects that appear in each sequence, without reading the > > whole set into memory. This is basically an SQL many-many join. > > > I thought it could be accomplished through recursively embedded > > generators, but that approach failed in the end. After posting the > > question to Stack Overflow[0], Martin Geisler proposed a wonderfully > > succinct and reusable solution (see below, or pretty printed at the > > Stack Overflow URL). > > [snip] > > Here's my version; keep a list of (curr_val, iterator) tuples and > operate on those. > > def intersect(seqs): > iter_pairs = [(it.next(), it) for (it) in map(iter, seqs)] > while True: > min_val = min(iter_pairs)[0] > max_val = max(iter_pairs)[0] > if min_val == max_val: > yield min_val > max_val += 1 # everybody advances > for i, (val, it) in enumerate(iter_pairs): > if val < max_val: > iter_pairs[i] = (it.next(), it) > # end while True
This version is a lot easier to understand. The implicit StopIteration is a double-edged sword for readability, but I like it. :) David > > Interestingly you don't need to explicitly catch StopIteration and > return because only the top level is a generator. So both lines where > it.next() are called will potentially end the loop. > I also tried using a defaultdict(list) as the main structure; it > worked but was uglier by far { curr_val => [it1, it2, ..]} with dels > and appends. > > -Jack > > ps, woops, I forgot to hit reply all the first time. -- http://mail.python.org/mailman/listinfo/python-list