On Apr 27, 5:15 am, Peter Otten <__pete...@web.de> wrote: > psaff...@googlemail.com wrote: > > I'm using the CSV library to process a large amount of data - 28 > > files, each of 130MB. Just reading in the data from one file and > > filing it into very simple data structures (numpy arrays and a > > cstringio) takes around 10 seconds. If I just slurp one file into a > > string, it only takes about a second, so I/O is not the bottleneck. Is > > it really taking 9 seconds just to split the lines and set the > > variables? > > > Is there some way I can improve the CSV performance? > > My ideas: > > (1) Disable cyclic garbage collection while you read the file into your data > structure: > > import gc > > gc.disable() > # create many small objects that you want to keep > gc.enable() > > (2) If your data contains only numerical data without quotes use > > numpy.fromfile() >
How would disabling the cyclic garbage collection make it go faster in this case? -- http://mail.python.org/mailman/listinfo/python-list