Thanks for your help. I compared the following code in NumPy with the csvread in Matlab for a very large csv file. Matlab read the file in 577 seconds. On the other hand, this code below kept running for over 2 hours. Can this program be made more efficient? FYI - The csv file was a simple 6 column file with a header row and more than a million records.
import csv from numpy import array import time t1=time.clock() file_to_read = file('somename.csv','r') read_from = csv.reader(file_to_read) read_from.next() datalist = [ map(float, row[:]) for row in read_from ] # now the real data data = array(datalist, dtype = float) elapsed=time.clock()-t1 print elapsed Robert Kern wrote: > oyekomova wrote: > > I would like to know how to convert a csv file with a header row into a > > floating point array without the header row. > > Use the standard library module csv. Something like the following is a cheap > and > cheerful solution: > > > import csv > import numpy > > def float_array_from_csv(filename, skip_header=True): > f = open(filename) > try: > reader = csv.reader(f) > floats = [] > if skip_header: > reader.next() > for row in reader: > floats.append(map(float, row)) > finally: > f.close() > > return numpy.array(floats) > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless enigma > that is made terrible by our own mad attempt to interpret it as though it had > an underlying truth." > -- Umberto Eco -- http://mail.python.org/mailman/listinfo/python-list