dirknbr wrote:
Hi
I have 2 files (done and outf), and I want to chose unique elements
from the 2nd column in outf which are not in done. This code works but
is not efficient, can you think of a quicker way? The a=1 is just a
redundant task obviously, I put it this way around because I think
'in' is quicker than 'not in' - is that true?

done_={}
for line in done:
    done_[line.strip()]=0

print len(done_)

universe={}
for line in outf:
    if line.split(',')[1].strip() in universe.keys():
        a=1
    else:
        if line.split(',')[1].strip() in done_.keys():
            a=1
        else:
            universe[line.split(',')[1].strip()]=0

Dirk

Where you have a=1, one would normally use the "pass" statement. But you're wrong that 'not in' is less efficient than 'in'. If there's a difference, it's probably negligible, and almost certainly less than the extra else clause you're forcing here.
When doing an 'in', do *not* use the keys() method, as you're replacing 
a fast lookup with a slow one, not to mention the time it takes to build 
the keys() list each time.
In both these cases, you can use a set, rather than a dict.  And there's 
no need to test whether the item is already in the set, just put it in 
again.
Changing all that, you'll wind up with something like (untested)

done_set = set()
universe = set()
for line in done:
   done_set.add(line.strip())
for line in outf:
   item = line.split(',')[1].strip()
   if item not in done_set
       universe.add(item)


DaveA

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to