Marc 'BlackJack' Rintsch wrote:
On Fri, 09 Jan 2009 04:04:41 +0100, Johannes Bauer wrote:
As this was horribly slow (20 Minutes for a 2GB file) I coded the whole
thing in C also:
Yours took ~37 minutes for 2 GiB here. This "just" ~15 minutes:
#!/usr/bin/env python
from __future__ import division, with_statement
import os
import sys
from collections import defaultdict
from functools import partial
from itertools import imap
def iter_max_values(blocks, block_count):
for i, block in enumerate(blocks):
histogram = defaultdict(int)
for byte in block:
histogram[byte] += 1
yield max((count, byte)
for value, count in histogram.iteritems())[1]
[snip]
Would it be faster if histogram was a list initialised to [0] * 256?
--
http://mail.python.org/mailman/listinfo/python-list