...and just for fun this D code is about 3.2 times faster than the Psyco version for the same dataset (30% lines with a space):
import std.stdio, std.conv, std.string, std.stream; int[int] get_hist(string file_name) { int[int] hist; foreach(string line; new BufferedFile(file_name)) { int pos = find(line, ' '); if (pos == -1) hist[toInt(line)]++; else hist[toInt(line[0 .. pos])] += toInt(line[pos+1 .. $]); } return hist; } void main(string[] args) { writefln( get_hist(args[1]).length ); } Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list