On Jun 1, 1:49Â am, Peter Otten <[EMAIL PROTECTED]> wrote: > Peter Otten wrote: > > #untested > > Already found two major blunders :( > > # still untested > import difflib > > def _merge(a, b): > Â Â sm = difflib.SequenceMatcher(None, a, b) > Â Â for op, a1, a2, b1, b2 in sm.get_opcodes(): > Â Â Â Â if op == "insert": > Â Â Â Â Â Â yield b[b1:b2] > Â Â Â Â elif op == "replace": > Â Â Â Â Â Â yield a[a1:a2] > Â Â Â Â Â Â yield b[b1:b2] > Â Â Â Â else: # delete, equal > Â Â Â Â Â Â yield a[a1:a2] > > def merge(a, b): > Â Â return sum(_merge(a, b), []) > > def merge_to_unique(sources): > Â Â return unique(reduce(merge, sorted(sources, key=len, reverse=True))) >
difflib.SequenceMatcher looks promising; I'll try it. Thanks! > def unique(items): > Â Â u = set(items) > Â Â if len(u) == len(items): > Â Â Â Â return items > Â Â result = [] > Â Â for item in items: > Â Â Â Â if item in u: > Â Â Â Â Â Â result.append(item) > Â Â Â Â Â Â u.remove(item) > Â Â return result You did right by preserving the original (non-alphabetical) ordering, but I'm less enthusiastic about the shape of this function. My original function used 7 lines of code, and only 1 for the unique() step. This uses up to three container objects. Is it really an improvement? (Secret: the reference list (or, any of the sources) is unlikely to be more than a few dozen elements long. The data set that puts merge_to_unique through a workout will be a giant list of comparatively short lists, so the unique() part just needs to be short and conceptually clean, while merge() should attempt sane behavior for large len(sources).) -- http://mail.python.org/mailman/listinfo/python-list