I am starting to use numpy and have written a hack for reading in a large data set that has 8 columns and millions of rows. I want to read and process a single column. I have written the very ugly hack below, but am sure there is a more efficient and pythonic way to do this. The file is too big to read by brute force and select a column, so it is read in chunks and the column selected. Things I don't like in the code: 1. Performing a transpose on a large array 2. Uncertainty about numpy append efficiency
Is there a way to directly read every n'th element from the file into an array? david from numpy import * from scipy.io.numpyio import fread fd = open('testcase.bin', 'rb') datatype = 'h' byteswap = 0 M = 1000000 N = 8 size = M*N shape = (M,N) colNum = 2 sf =1.645278e-04*10 z=array([]) for i in xrange(50): data = fread(fd, size, datatype,datatype,byteswap) data = data.reshape(shape) data = data.transpose() z = append(z,data[colNum]*sf) print z.mean() fd.close() -- http://mail.python.org/mailman/listinfo/python-list