Thanks for your replies. Many apologies for not including the right information first time around. More information is below.
I have tried running it just on the csv read: import time import csv afile = "largefile.txt" t0 = time.clock() print "working at file", afile reader = csv.reader(open(afile, "r"), delimiter="\t") for row in reader: x,y,z = row t1 = time.clock() print "finished: %f.2" % (t1 - t0) $ ./largefilespeedtest.py working at file largefile.txt finished: 3.860000.2 A tiny bit of background on the final application: this is biological data from an affymetrix platform. The csv files are a chromosome name, a coordinate and a data point, like this: chr1 3754914 1.19828 chr1 3754950 1.56557 chr1 3754982 1.52371 In the "simple data structures" cod below, I do some jiggery pokery with the chromosome names to save me storing the same string millions of times. import csv import cStringIO import numpy import time afile = "largefile.txt" chrommap = {'chrY': 'y', 'chrX': 'x', 'chr13': 'c', 'chr12': 'b', 'chr11': 'a', 'chr10': '0', 'chr17': 'g', 'chr16': 'f', 'chr15': 'e', 'chr14': 'd', 'chr19': 'i', 'chr18': 'h', 'chrM': 'm', 'chr22': 'l', 'chr20': 'j', 'chr21': 'k', 'chr7': '7', 'chr6': '6', 'chr5': '5', 'chr4': '4', 'chr3': '3', 'chr2': '2', 'chr1': '1', 'chr9': '9', 'chr8': '8'} def getFileLength(fh): wholefile = fh.read() numlines = wholefile.count("\n") fh.seek(0) return numlines count = 0 print "reading affy file", afile fh = open(afile) n = getFileLength(fh) chromio = cStringIO.StringIO() coords = numpy.zeros(n, dtype=int) points = numpy.zeros(n) t0 = time.clock() reader = csv.reader(fh, delimiter="\t") for row in reader: if not row: continue chrom, coord, point = row mappedc = chrommap[chrom] chromio.write(mappedc) coords[count] = coord points[count] = point count += 1 t1 = time.clock() print "finished: %f.2" % (t1 - t0) $ ./affyspeedtest.py reading affy file largefile.txt finished: 15.540000.2 Thanks again (tugs forelock), Peter -- http://mail.python.org/mailman/listinfo/python-list