On Mon, 27 Apr 2009 04:22:24 -0700 (PDT), psaff...@googlemail.com wrote: > I'm using the CSV library to process a large amount of data - 28 > files, each of 130MB. Just reading in the data from one file and > filing it into very simple data structures (numpy arrays and a > cstringio) takes around 10 seconds. If I just slurp one file into a > string, it only takes about a second, so I/O is not the bottleneck. Is > it really taking 9 seconds just to split the lines and set the > variables?
I assume you're reading a 130 MB text file in 1 second only after OS already cashed it, so you're not really measuring disk I/O at all. Parsing a 130 MB text file will take considerable time no matter what. Perhaps you should consider using a database instead of CSV. -- http://mail.python.org/mailman/listinfo/python-list