At Wednesday 10/1/2007 16:48, oyekomova wrote:

Thanks for your help. I compared the following code in NumPy with the
csvread in Matlab for a very large csv file. Matlab read the file in
577 seconds. On the other hand, this code below kept running for over 2
hours. Can this program be made more efficient? FYI - The csv file was
a simple 6 column file with a header row and more than a million
records.


import csv
from numpy import array
import time
t1=time.clock()
file_to_read = file('somename.csv','r')
read_from = csv.reader(file_to_read)
read_from.next()

datalist = [ map(float, row[:]) for row in read_from ]

# now the real data
data = array(datalist, dtype = float)

elapsed=time.clock()-t1
print elapsed

Replace that row[:] by row, it's just a waste of time and memory.
And see http://www.scipy.org/Cookbook/InputOutput


--
Gabriel Genellina
Softlab SRL

        

        
                
__________________________________________________ Preguntá. Respondé. Descubrí. Todo lo que querías saber, y lo que ni imaginabas, está en Yahoo! Respuestas (Beta). ¡Probalo ya! http://www.yahoo.com.ar/respuestas
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to