deduping

dirknbr Mon, 21 Jun 2010 05:43:01 -0700

Hi

I have 2 files (done and outf), and I want to chose unique elements
from the 2nd column in outf which are not in done. This code works but
is not efficient, can you think of a quicker way? The a=1 is just a
redundant task obviously, I put it this way around because I think
'in' is quicker than 'not in' - is that true?


done_={}
for line in done:
    done_[line.strip()]=0

print len(done_)

universe={}
for line in outf:
    if line.split(',')[1].strip() in universe.keys():
        a=1
    else:
        if line.split(',')[1].strip() in done_.keys():
            a=1
        else:
            universe[line.split(',')[1].strip()]=0

Dirk
-- 
http://mail.python.org/mailman/listinfo/python-list

deduping

Reply via email to