I have a function uniqueList that is below :
Def uniqueList(origList):
nodups= {}
for temp in origList:
nodups[temp] = None
returns nodups.keys()
When used in the following context :
industryList = uniqueList(jpbarradata[group])
where jpbarradata[group] might look like
["AAA BC",BBB KK","CCC TD","AAA KP","CCC TD"]
,the function works in the sense that it would return
["AAA BC","BBB KK","CCC TD",AAA KP"]
because CCC TD is duplicated.
But, I also want it to get rid of the AAA KP because
there are two AAA's even though the last two letters
are different. It doesn't matter to me which one
is gotten rid of but I don't know how to change
the function to handle this ? I have a feeling
it's not that hard though ? Thanks.
Hi Mark,
please turn off the HTML formatting when posting. It makes your message quite a lot bigger than need be, and, in this case anyway, makes the plain text version doubled spaced (as above) and thus a bit nasty to read. Thanks.
For the question:
Is order in your output important? If so, I wouldn't use a dictionary to store the unique items. I see why you did it, but since dictionaries don't have order, your output might get permuted.
How about this (don't take the naming as a model!):
def unique_up_to_n_char(orig_list, n): '''-> list of elements where each is unique up to the first n chars. '''
# Needs Python 2.4 for set type. You could use a list, too. seen_leading_chars = set()
output_list = [] for member in orig_list: if member[:n] not in seen_leading_chars: seen_leading_chars.add(member[:n]) output_list.append(member) return output_list
test_list = ["AAA BC", "BBB KK", "CCC TD", "AAA KP", "CCC TD", "AAB KP"]
print unique_up_to_n_char(test_list, 3) print unique_up_to_n_char(test_list, 2)
which produces: ['AAA BC', 'BBB KK', 'CCC TD', 'AAB KP'] ['AAA BC', 'BBB KK', 'CCC TD']
There may be still better ways. But, this is general and preserves order.
Best,
Brian vdB
-- http://mail.python.org/mailman/listinfo/python-list