On Fri, 18 Jan 2008 09:58:57 -0800, Paul Rubin wrote: > David Sanders <[EMAIL PROTECTED]> writes: >> The data files are large (~100 million lines), and this code takes a >> long time to run (compared to just doing wc -l, for example). > > wc is written in carefully optimized C and will almost certainly run > faster than any python program.
However, wc -l doesn't do the same thing as what the Original Poster is trying to do. There is little comparison between counting the number of lines and building a histogram, except that both tasks have to see each line. Naturally the second task will take longer compared to wc. ("Why does it take so long to make a three-tier wedding cake? I can boil an egg in three minutes!!!") -- Steven -- http://mail.python.org/mailman/listinfo/python-list