r.e.s. wrote:

> I have a million-line text file with 100 characters per line,
> and simply need to determine how many of the lines are distinct.
> 
> On my PC, this little program just goes to never-never land:
> 
> def number_distinct(fn):
>     f = file(fn)
>     x = f.readline().strip()
>     L = []
>     while x<>'':
>         if x not in L:
>             L = L + [x]
>         x = f.readline().strip()
>     return len(L) 

ouch.

> Would anyone care to point out improvements? 
> Is there a better algorithm for doing this?

try this:

def number_distinct(fn):
     return len(set(s.strip() for s in open(fn)))

</F>

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to