On Jun 1, 12:34Â am, Raymond Hettinger <[EMAIL PROTECTED]> wrote: > > I would do it two steps. Â There's a number of ways to merge depending > on whether everything is pulled into memory or > not:http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/491285http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/305269 > > After merging, the groupby itertool is good for removing duplicates: > > Â Â result = [k for k, g in groupby(imerge(*sources))] > > Raymond
Thanks for the tip; itertools never ceases to amaze. One issue: groupby doesn't seem to remove all duplicates, just consecutive ones (for lists of strings and integers, at least): >>> [k for k, g in itertools.groupby(list("asdfdfffdf"))] ['a', 's', 'd', 'f', 'd', 'f', 'd', 'f'] Another issue: dropping everything into a heap and pulling it back out looks like it loses the original ordering, which isn't necessarily alphabetical, but "however the user wants to organize the spreadsheet". That's why I originally avoided using sorted(set(itertools.chain(*sources))). Do you see another way around this? -- http://mail.python.org/mailman/listinfo/python-list