On Fri, 09 Jan 2009 15:34:17 +0000, MRAB wrote: > Marc 'BlackJack' Rintsch wrote: > >> def iter_max_values(blocks, block_count): >> for i, block in enumerate(blocks): >> histogram = defaultdict(int) >> for byte in block: >> histogram[byte] += 1 >> >> yield max((count, byte) >> for value, count in histogram.iteritems())[1] >> > [snip] > Would it be faster if histogram was a list initialised to [0] * 256?
Don't know. Then for every byte in the 2 GiB we have to call `ord()`. Maybe the speedup from the list compensates this, maybe not. I think that we have to to something with *every* byte of that really large file *at Python level* is the main problem here. In C that's just some primitive numbers. Python has all the object overhead. Ciao, Marc 'BlackJack' Rintsch -- http://mail.python.org/mailman/listinfo/python-list