> Thanks for the tip; itertools never ceases to amaze. One issue: > groupby doesn't seem to remove all duplicates, just consecutive ones > (for lists of strings and integers, at least): > > >>> [k for k, g in itertools.groupby(list("asdfdfffdf"))] > > ['a', 's', 'd', 'f', 'd', 'f', 'd', 'f']
That's why the example ran imerge() first. That takes several sorted inputs and gives a single sorted output. Then, groupby() eliminates the dups from the sorted input where the equal valued entries have been brought together. > Another issue: dropping everything into a heap and pulling it back out > looks like it loses the original ordering, which isn't necessarily > alphabetical, but "however the user wants to organize the > spreadsheet". That's why I originally avoided using > sorted(set(itertools.chain(*sources))). Do you see another way around > this? If your inputs were already sorted, then the above algorithms maintain that order and eliminate duplicates. If the inputs were not sorted, then I don't think you have a precise idea of what it means to merge them while preserving order. For example if the inputs are XYZPDQ and bYlmPz, then what does a merged sequence look like once the Y and P duplicates are removed? Is it XZPDQblmz or some intermix of the two like XbZlPmDzQ? If it is the first of those, then the answer is simple: seen = set() for elem in chain(*sources): if elem in seen: continue yield elem seen.add(elem) Raymond -- http://mail.python.org/mailman/listinfo/python-list