Re: number of different lines in a file

Ben Stroud Fri, 19 May 2006 08:22:23 -0700

>
>It never occured to me to use the Python dict/set approach.  Now I
>wonder if it would've worked better somehow.  Of course my file was
>26,000 X larger than the one in this problem, and definitely would
>not fit in memory.  I suspect that there were as many as a million
>duplicates for some messages in that file.  Would the generator
>version above have helped me out, I wonder?
>
>
>  
>


You could use a dbm file approach which would provide a external 
dict/set interface through Python bindings.  This would use less memory.

1.  Add records to dbm as keys
2.  dbm (if configured correctly) will only keep unique keys
3.  Count keys

Cheers,
Ben

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: number of different lines in a file

Reply via email to