psaff...@googlemail.com wrote: > Thanks for your replies. Many apologies for not including the right > information first time around. More information is below. > > I have tried running it just on the csv read:
> $ ./largefilespeedtest.py > working at file largefile.txt > finished: 3.860000.2 > > > A tiny bit of background on the final application: this is biological > data from an affymetrix platform. The csv files are a chromosome name, > a coordinate and a data point, like this: > > chr1 3754914 1.19828 > chr1 3754950 1.56557 > chr1 3754982 1.52371 > > In the "simple data structures" cod below, I do some jiggery pokery > with the chromosome names to save me storing the same string millions > of times. > $ ./affyspeedtest.py > reading affy file largefile.txt > finished: 15.540000.2 It looks like most of the time is not spent in the csv.reader(). Here's an alternative way to read your data: rows = fh.read().split() coords = numpy.array(map(int, rows[1::3]), dtype=int) points = numpy.array(map(float, rows[2::3]), dtype=float) chromio.writelines(map(chrommap.__getitem__, rows[::3])) Do things improve if you simplify your code like that? Peter -- http://mail.python.org/mailman/listinfo/python-list