On Mon, 27 Apr 2009 23:56:47 +0200, dean <de...@yahoo.com> wrote: > On Mon, 27 Apr 2009 04:22:24 -0700 (PDT), psaff...@googlemail.com wrote: > >> I'm using the CSV library to process a large amount of data - 28 >> files, each of 130MB. Just reading in the data from one file and >> filing it into very simple data structures (numpy arrays and a >> cstringio) takes around 10 seconds. If I just slurp one file into a >> string, it only takes about a second, so I/O is not the bottleneck. Is >> it really taking 9 seconds just to split the lines and set the >> variables? > > I assume you're reading a 130 MB text file in 1 second only after OS > already cashed it, so you're not really measuring disk I/O at all. > > Parsing a 130 MB text file will take considerable time no matter what. > Perhaps you should consider using a database instead of CSV.
Why would that be faster? (Assuming all data is actually read from the database into data structures in the program, as in the text file case.) I am asking because people who like databases tend to overestimate the time it takes to parse text. (And I guess people like me who prefer text files tend to underestimate the usefullness of databases.) /Jorgen -- // Jorgen Grahn <grahn@ Ph'nglui mglw'nafh Cthulhu \X/ snipabacken.se> R'lyeh wgah'nagl fhtagn! -- http://mail.python.org/mailman/listinfo/python-list