oyekomova wrote: > Thanks for your help. I compared the following code in NumPy with the > csvread in Matlab for a very large csv file. Matlab read the file in > 577 seconds. On the other hand, this code below kept running for over 2 > hours. Can this program be made more efficient? FYI - The csv file was > a simple 6 column file with a header row and more than a million > records. > > > import csv > from numpy import array > import time > t1=time.clock() > file_to_read = file('somename.csv','r') > read_from = csv.reader(file_to_read) > read_from.next() > > datalist = [ map(float, row[:]) for row in read_from ] > > # now the real data > data = array(datalist, dtype = float) > > elapsed=time.clock()-t1 > print elapsed >
If you use numpy.fromfile, you need to skip past the initial header row yourself. Something like this: fid = open('somename.csv') data = numpy.fromfile(fid, sep=',').reshape(-1,6) # for 6-column data. -Travis -- http://mail.python.org/mailman/listinfo/python-list