On Thu, Jun 11, 2009 at 2:54 PM, Terry Reedy<tjre...@udel.edu> wrote: > Jack Diederich wrote: >> >> On Thu, Jun 11, 2009 at 12:03 AM, David M. Wilson<d...@botanicus.net> wrote: >> [snip] >>> >>> I found my answer: Python 2.6 introduces heap.merge(), which is >>> designed exactly for this. >> >> Thanks, I knew Raymond added something like that but I couldn't find >> it in itertools. >> That said .. it doesn't help. Aside, heapq.merge fits better in >> itertools (it uses heaps internally but doesn't require them to be >> passed in). The other function that almost helps is >> itertools.groupby() and it doesn't return an iterator so is an odd fit >> for itertools. >> >> More specifically (and less curmudgeonly) heap.merge doesn't help for >> this particular case because you can't tell where the merged values >> came from. You want all the iterators to yield the same thing at once >> but heapq.merge muddles them all together (but in an orderly way!). >> Unless I'm reading your tokenizer func wrong it can yield the same >> value many times in a row. If that happens you don't know if four >> "The"s are once each from four iterators or four times from one. > > David is looking to intersect sorted lists of document numbers with > duplicates removed in order to find documents that contain worda and wordb > and wordc ... . But you are right that duplicate are a possible fly in the > ointment to be removed before merging.
Ah, in that case the heap.merge solution is both useful and succinct: import heapq import itertools def intersect(its): source = heapq.merge(*its) while True: sames = [source.next()] sames.extend(itertools.takewhile(lambda v:v == sames[0], source)) if len(sames) == len(its): yield sames[0] return -Jack -- http://mail.python.org/mailman/listinfo/python-list