> for line in file: The first thing I would try is just doing a
for line in file: pass to see how much time is consumed merely by iterating over the file. This should give you a baseline from which you can base your timings > data = line.split() > first = int(data[0]) > > if len(data) == 1: > count = 1 > else: > count = int(data[1]) # more than one repetition Well, some experiments I might try: try: first, count = map(int, data) except: first = int(data[0]) count = 1 or possibly first = int(data[0]) try: count = int(data[1]) except: count = 0 or even # pad it to contain at least two items # then slice off the first two # and then map() calls to int() first, count = map(int,(data + [1])[:2]) I don't know how efficient len() is (if it's internally linearly counting the items in data, or if it's caching the length as data is created/assigned/modifed) and how that efficiency compares to try/except blocks, map() or int() calls. I'm not sure any of them is more or less "pythonic", but they should all do the same thing. > if first in hist: # add the information to the histogram > hist[first]+=count > else: > hist[first]=count This might also be written as hist[first] = hist.get(first, 0) + count > Is a dictionary the right way to do this? In any given file, there is > an upper bound on the data, so it seems to me that some kind of array > (numpy?) would be more efficient, but the upper bound changes in each > file. I'm not sure an array would net you great savings here, since the upper-bound seems to be an unknown. If "first" has a known maximum (surely, the program generating this file has an idea to the range of allowed values), you could just create an array the length of the span of numbers, initialized to zero, which would reduce the hist.get() call to just hist[first] += count and then you'd iterate over hist (which would already be sorted because it's in index order) and use those where count != 0 to avoid the holes. Otherwise, your code looks good...the above just riff on various ways of rewriting your code in case one nets you extra time-savings per loop. -tkc -- http://mail.python.org/mailman/listinfo/python-list