oyekomova wrote: > Thanks for your help. I compared the following code in NumPy with the > csvread in Matlab for a very large csv file. Matlab read the file in > 577 seconds. On the other hand, this code below kept running for over 2 > hours. Can this program be made more efficient? FYI - The csv file was > a simple 6 column file with a header row and more than a million > records. > > > import csv > from numpy import array > import time > t1=time.clock() > file_to_read = file('somename.csv','r') > read_from = csv.reader(file_to_read) > read_from.next()
> datalist = [ map(float, row[:]) for row in read_from ] I'm willing to bet that this is your problem. Python lists are arrays under the hood! Try something like this instead: # read the whole file in one chunk lines = file_to_read.readlines() # count the number of columns n = 1 for c in lines[1]: if c == ',': n += 1 # count the number of rows m = len(lines[1:]) #allocate data = empty((m,n),dtype=float) # create csv reader, skip header reader = csv.reader(lines[1:]) # read for i in arange(0,m): data[i,:] = map(float,reader.next()) And if this is too slow, you may consider vectorizing the last loop: data = empty((m,n),dtype=float) newstr = ",".join(lines[1:]) flatdata = data.reshape((n*m)) # flatdata is a view of data, not a copy reader = csv.reader([newstr]) flatdata[:] = map(float,reader.next()) I hope this helps! > Robert Kern wrote: > > oyekomova wrote: > > > I would like to know how to convert a csv file with a header row into a > > > floating point array without the header row. > > > > Use the standard library module csv. Something like the following is a > > cheap and > > cheerful solution: > > > > > > import csv > > import numpy > > > > def float_array_from_csv(filename, skip_header=True): > > f = open(filename) > > try: > > reader = csv.reader(f) > > floats = [] > > if skip_header: > > reader.next() > > for row in reader: > > floats.append(map(float, row)) > > finally: > > f.close() > > > > return numpy.array(floats) > > > > -- > > Robert Kern > > > > "I have come to believe that the whole world is an enigma, a harmless enigma > > that is made terrible by our own mad attempt to interpret it as though it > > had > > an underlying truth." > > -- Umberto Eco -- http://mail.python.org/mailman/listinfo/python-list