Re: CSV performance

grocery_stocker Mon, 27 Apr 2009 07:04:13 -0700

On Apr 27, 5:15 am, Peter Otten <__pete...@web.de> wrote:
> psaff...@googlemail.com wrote:
> > I'm using the CSV library to process a large amount of data - 28
> > files, each of 130MB. Just reading in the data from one file and
> > filing it into very simple data structures (numpy arrays and a
> > cstringio) takes around 10 seconds. If I just slurp one file into a
> > string, it only takes about a second, so I/O is not the bottleneck. Is
> > it really taking 9 seconds just to split the lines and set the
> > variables?
>
> > Is there some way I can improve the CSV performance?
>
> My ideas:
>
> (1) Disable cyclic garbage collection while you read the file into your data
> structure:
>
> import gc
>
> gc.disable()
> # create many small objects that you want to keep
> gc.enable()
>
> (2) If your data contains only numerical data without quotes use
>
> numpy.fromfile()
>


How would disabling the cyclic garbage collection make it go faster in
this case?

--
http://mail.python.org/mailman/listinfo/python-list

Re: CSV performance

Reply via email to