Re: newbie:unique problem

Brian van den Broek Thu, 17 Mar 2005 12:50:42 -0800

Leeds, Mark said unto the world upon 2005-03-17 14:08:

I have a function uniqueList that is below :

Def uniqueList(origList):

    nodups= {}

    for temp in origList:

       nodups[temp]  = None

    returns nodups.keys()

When used in the following context :

industryList = uniqueList(jpbarradata[group])

where jpbarradata[group] might look like

["AAA BC",BBB KK","CCC TD","AAA KP","CCC TD"]

,the function works in the sense that it would return

["AAA BC","BBB KK","CCC TD",AAA KP"]

because CCC TD is duplicated.

But, I also want it to get rid of the AAA KP because

there are two AAA's even though the last two letters

are different. It doesn't matter to me which one

is gotten rid of but I don't know how to change

the function to handle this ? I have a feeling

it's not that hard though ? Thanks.


Hi Mark,

please turn off the HTML formatting when posting. It makes your message quite a lot bigger than need be, and, in this case anyway, makes the plain text version doubled spaced (as above) and thus a bit nasty to read. Thanks.

For the question:

Is order in your output important? If so, I wouldn't use a dictionary to store the unique items. I see why you did it, but since dictionaries don't have order, your output might get permuted.

How about this (don't take the naming as a model!):

def unique_up_to_n_char(orig_list, n):
    '''-> list of elements where each is unique up to the first n chars.
    '''

    # Needs Python 2.4 for set type. You could use a list, too.
    seen_leading_chars = set()

    output_list = []
    for member in orig_list:
        if member[:n] not in seen_leading_chars:
            seen_leading_chars.add(member[:n])
            output_list.append(member)
    return output_list

test_list = ["AAA BC", "BBB KK", "CCC TD", "AAA KP", "CCC TD", "AAB KP"]

print unique_up_to_n_char(test_list, 3)
print unique_up_to_n_char(test_list, 2)

which produces:
['AAA BC', 'BBB KK', 'CCC TD', 'AAB KP']
['AAA BC', 'BBB KK', 'CCC TD']

There may be still better ways. But, this is general and preserves order.

Best,

Brian vdB

--
http://mail.python.org/mailman/listinfo/python-list

Re: newbie:unique problem

Reply via email to