I'm using the CSV library to process a large amount of data - 28 files, each of 130MB. Just reading in the data from one file and filing it into very simple data structures (numpy arrays and a cstringio) takes around 10 seconds. If I just slurp one file into a string, it only takes about a second, so I/O is not the bottleneck. Is it really taking 9 seconds just to split the lines and set the variables?
Is there some way I can improve the CSV performance? Is there a way I can slurp the file into memory and read it like a file from there? Peter -- http://mail.python.org/mailman/listinfo/python-list