On Wed, Jun 10, 2009 at 6:24 PM, David Wilson<d...@botanicus.net> wrote: > During a fun coding session yesterday, I came across a problem that I > thought was already solved by itertools, but on investigation it seems > it isn't. > > The problem is simple: given one or more ordered sequences, return > only the objects that appear in each sequence, without reading the > whole set into memory. This is basically an SQL many-many join. > > I thought it could be accomplished through recursively embedded > generators, but that approach failed in the end. After posting the > question to Stack Overflow[0], Martin Geisler proposed a wonderfully > succinct and reusable solution (see below, or pretty printed at the > Stack Overflow URL). > [snip]
Here's my version; keep a list of (curr_val, iterator) tuples and operate on those. def intersect(seqs): iter_pairs = [(it.next(), it) for (it) in map(iter, seqs)] while True: min_val = min(iter_pairs)[0] max_val = max(iter_pairs)[0] if min_val == max_val: yield min_val max_val += 1 # everybody advances for i, (val, it) in enumerate(iter_pairs): if val < max_val: iter_pairs[i] = (it.next(), it) # end while True Interestingly you don't need to explicitly catch StopIteration and return because only the top level is a generator. So both lines where it.next() are called will potentially end the loop. I also tried using a defaultdict(list) as the main structure; it worked but was uglier by far { curr_val => [it1, it2, ..]} with dels and appends. -Jack ps, woops, I forgot to hit reply all the first time. -- http://mail.python.org/mailman/listinfo/python-list