I'm using the CSV library to process a large amount of data -
28 files, each of 130MB. Just reading in the data from one
file and filing it into very simple data structures (numpy
arrays and a cstringio) takes around 10 seconds. If I just
slurp one file into a string, it only takes about a second, so
I/O is not the bottleneck. Is it really taking 9 seconds just
to split the lines and set the variables?

You've omitted one important test: spinning through the file with csv-parsing, but not doing an "filing it into very simple data structures". Without that metric, there's no way to know whether the csv module is at fault, or if you're doing something malperformant with the data-structures.

-tkc



--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to