David Lees wrote: > I am starting to use numpy and have written a hack for reading in a > large data set that has 8 columns and millions of rows. I want to read > and process a single column. I have written the very ugly hack below, > but am sure there is a more efficient and pythonic way to do this. The > file is too big to read by brute force and select a column, so it is > read in chunks and the column selected. Things I don't like in the code: > 1. Performing a transpose on a large array
Transposition is trivially fast in numpy. It does not copy any memory. > 2. Uncertainty about numpy append efficiency Rest assured that it's slow. Appending to lists is fast since lists preallocate memory according to a scheme such that the amortized cost of appending elements is O(1). We don't quite have that luxury in numpy. > Is there a way to directly read every n'th element from the file into an > array? Since this is a regular binary file, you can memory map the file. import numpy M = 1000000 N = 8 column = 2 sf =1.645278e-04*10 m = numpy.memmap('testcase.bin', dtype=numpy.int16, shape=(M,N)) z = m[:,column] * sf You may want to ask future numpy questions on the numpy mailing list. http://www.scipy.org/Mailing_Lists -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco -- http://mail.python.org/mailman/listinfo/python-list