On Jun 2, 11:08Â pm, Raymond Hettinger <[EMAIL PROTECTED]> wrote: > > If the inputs were not sorted, then I don't think you have a precise > idea of what it means to merge them while preserving order. Â For > example if the inputs are XYZPDQ and bYlmPz, then what does a merged > sequence look like once the Y and P duplicates are removed? Is it > XZPDQblmz or some intermix of the two like XbZlPmDzQ? Â If it is the > first of those, then the answer is simple: >
I was looking at Bram Cohen's description of his diff algorithm, implemented in Bazaar: http://bramcohen.livejournal.com/37690.html "Instead of doing a longest common subsequence on everything, it does a longest common subsequence on lines which occur exactly once on both sides, then recurses between lines which got matched on that pass." So, if sources looks like: [list("XYZPDQ"), list("bYlmPz")] Then the proper merge would use Y and P as the delimiters, putting X and b before Y; Z, l and m before P; and D, Q and z after P -- like your second example (but keeping an instance of Y). Right? Leaving us with the context-dependent issue of ordering between the delimiters, probably depending on which list is used as the reference initially. So, if the reference list's elements go first, the result would be XbYZlmPDQz. If the elements from later sources go after the last common element (my first approach), it's bXYlmZPzDQ. However, in the cold light of morning it looks like that was the wrong approach -- it makes sense to treat any data that doesn't occur in the reference list as semi-obsolete and append it instead -- so I think I'll use your much simpler algorithm. Thanks! -- http://mail.python.org/mailman/listinfo/python-list