Re: CSV performance

Peter Otten Mon, 27 Apr 2009 05:20:47 -0700

psaff...@googlemail.com wrote:

> I'm using the CSV library to process a large amount of data - 28
> files, each of 130MB. Just reading in the data from one file and
> filing it into very simple data structures (numpy arrays and a
> cstringio) takes around 10 seconds. If I just slurp one file into a
> string, it only takes about a second, so I/O is not the bottleneck. Is
> it really taking 9 seconds just to split the lines and set the
> variables?
> 
> Is there some way I can improve the CSV performance?


My ideas:

(1) Disable cyclic garbage collection while you read the file into your data
structure:

import gc

gc.disable()
# create many small objects that you want to keep
gc.enable() 


(2) If your data contains only numerical data without quotes use

numpy.fromfile()

Peter
--
http://mail.python.org/mailman/listinfo/python-list

Re: CSV performance

Reply via email to