psaff...@googlemail.com wrote: > I'm using the CSV library to process a large amount of data - 28 > files, each of 130MB. Just reading in the data from one file and > filing it into very simple data structures (numpy arrays and a > cstringio) takes around 10 seconds. If I just slurp one file into a > string, it only takes about a second, so I/O is not the bottleneck. Is > it really taking 9 seconds just to split the lines and set the > variables? > > Is there some way I can improve the CSV performance?
My ideas: (1) Disable cyclic garbage collection while you read the file into your data structure: import gc gc.disable() # create many small objects that you want to keep gc.enable() (2) If your data contains only numerical data without quotes use numpy.fromfile() Peter -- http://mail.python.org/mailman/listinfo/python-list