Here's an algorithm question: How should I efficiently merge a collection of mostly similar lists, with different lengths and arbitrary contents, while eliminating duplicates and preserving order as much as possible?
My code: def merge_to_unique(sources): """Merge the unique elements from each list in sources into new list. Using the longest input list as a reference, merges in the elements from each of the smaller or equal-length lists, and removes duplicates. @return: Combined list of elements. """ sources.sort(None, len, True) # Descending length ref = sources[0] for src in sources[1:]: for i, s in enumerate(src): if s and (ref[i] != s) and s not in ref: ref.insert(ref.index(src[i-1])+1, s) # Remove duplicates return [r for i, r in enumerate(ref) if r and r not in ref[i+1:]] This comes up with using the CSV module's DictWriter class to merge a set (list, here) of not-quite-perfect CSV sources. The DictWriter constructor needs a list of field names so that it can convert dictionaries into rows of the CSV file it writes. Some of the input CSV files are missing columns, some might have extras -- all of this should be accepted, and the order of the columns in the merged file should match the order of the input files as much as possible (not alphabetical). All of the list elements are strings, in this case, but it would be nice if the function didn't require it. Speed actually isn't a problem yet; it might matter some day, but for now it's just an issue of conceptual aesthetics. Any suggestions? -- http://mail.python.org/mailman/listinfo/python-list