oyekomova wrote: > csvread in Matlab for a very large csv file. Matlab read the file in > 577 seconds. On the other hand, this code below kept running for over 2 > hours. Can this program be made more efficient? FYI
There must be something wrong with your setup/program. I work with large csv files as well and I never have performance problems of that magnitude. Make sure you are not doing something else while parsing your data. Parsing 1 million lines with six columns with the program below takes 87 seconds on my laptop. Even your original version with extra slices and all would still only be take about 50% more time. import time, csv, random from numpy import array def make_data(rows=1E6, cols=6): fp = open('data.txt', 'wt') counter = range(cols) for row in xrange( int(rows) ): vals = map(str, [ random.random() for x in counter ] ) fp.write( '%s\n' % ','.join( vals ) ) fp.close() def read_test(): start = time.clock() reader = csv.reader( file('data.txt') ) data = [ map(float, row) for row in reader ] data = array(data, dtype = float) print 'Data size', len(data) print 'Elapsed', time.clock() - start #make_data() read_test() -- http://mail.python.org/mailman/listinfo/python-list