Lasse Vågsæther Karlsen wrote: > <snip> > > Another idea for this method would be that in some cases I noticed that > it was useful to know which source each element would come from as well, > as well as removing duplicates from the results. > <snip>
The "removing duplicates" problem would probably be best as a separate function and it occurs to me that perhaps Python has such a function already. Again, this function would need the following criteria: 1. Be able to iterate through something other than a list 2. Yield the values, not return a list 3. Take an arbitrary cmp function to determine what is a duplicate As sugar, perhaps also the following criteria: - Ability to "combine" the duplicates through a special function A simple first-version function I hacked together does this: def unique(source, cmp=cmp, key=None, combine=None): it = iter(source) first = True value = it.next() values = [value] while True: try: value = it.next() if key is not None: cmp_result = cmp(values[0][key], value[key]) else: cmp_result = cmp(values[0], value) if cmp_result == 0: values.append(value) else: if combine is not None: yield combine(values) else: yield values[0] values = [value] except StopIteration: if combine is not None: yield combine(values) else: yield values[0] break raise StopIteration Note that this function does not do any sorting so if the source that it gets the values from is not sorted, the result will be very wrong. This is again due to my criteria of being able to handle cursors retrieving data from a database and thus avoid loading everything into memory. The combine function is passed a list of "duplicate" values and must return a value that will be yielded out of unique. Example of usage: def sum_counts(values): value = values[0][0] sum = 0 for row in values: sum += row[1] return value, sum fruits = [["Apple", 10], ["Apple", 15], ["Banana", 23], ["Orange", 17], ["Orange", 17]] for fruit, total_sum in unique(fruits, key=0, combine=sum_counts): print fruit, "has a sum of", total_sum This will produce: Apple has a sum of 25 Banana has a sum of 23 Orange has a sum of 34 Function name is perhaps not the best one. It occurs to me that this is the GROUP BY function in SQL so perhaps a different name is better, but then again this might be moot if such a function already exists somewhere :) -- Lasse Vågsæther Karlsen http://usinglvkblog.blogspot.com/ mailto:[EMAIL PROTECTED] PGP KeyID: 0x2A42A1C2 -- http://mail.python.org/mailman/listinfo/python-list