Re: CSV performance

2009-04-29 Thread psaff...@googlemail.com
> > rows = fh.read().split() > coords = numpy.array(map(int, rows[1::3]), dtype=int) > points = numpy.array(map(float, rows[2::3]), dtype=float) > chromio.writelines(map(chrommap.__getitem__, rows[::3])) > My original version is about 15 seconds. This version is about 9. The chunks version posted

Re: CSV performance

2009-04-29 Thread Lawrence D'Oliveiro
In message , Jorgen Grahn wrote: > I am asking because people who like databases tend to overestimate the > time it takes to parse text. And those of us who regularly load databases from text files, or unload them in the opposite direction, have a good idea of EXACTLY how long it takes to pars

Re: CSV performance

2009-04-29 Thread Jorgen Grahn
On Mon, 27 Apr 2009 23:56:47 +0200, dean wrote: > On Mon, 27 Apr 2009 04:22:24 -0700 (PDT), psaff...@googlemail.com wrote: > >> I'm using the CSV library to process a large amount of data - 28 >> files, each of 130MB. Just reading in the data from one file and >> filing it into very simple data st

Re: CSV performance

2009-04-29 Thread Lawrence D'Oliveiro
In message , Peter Otten wrote: > When I see the sequence > > save state > change state > do something > restore state > > I feel compelled to throw in a try ... finally Yeah, but I try to avoid using exceptions to that extent. :) -- http://mail.python.org/mailman/listinfo/python-list

Re: CSV performance

2009-04-29 Thread Peter Otten
Lawrence D'Oliveiro wrote: > In message , Peter Otten wrote: > >> gc.disable() >> # create many small objects that you want to keep >> gc.enable() > > Every time I see something like this, I feel the urge to save the previous > state and restore it afterwards: > > save_enabled = gc.isenable

Re: CSV performance

2009-04-28 Thread Lawrence D'Oliveiro
In message , Peter Otten wrote: > gc.disable() > # create many small objects that you want to keep > gc.enable() Every time I see something like this, I feel the urge to save the previous state and restore it afterwards: save_enabled = gc.isenabled() gc.disable() # create many small

Re: CSV performance

2009-04-27 Thread dean
On Mon, 27 Apr 2009 04:22:24 -0700 (PDT), psaff...@googlemail.com wrote: > I'm using the CSV library to process a large amount of data - 28 > files, each of 130MB. Just reading in the data from one file and > filing it into very simple data structures (numpy arrays and a > cstringio) takes around

Re: CSV performance

2009-04-27 Thread Scott David Daniels
psaff...@googlemail.com wrote: Thanks for your replies. Many apologies for not including the right information first time around. More information is below Here is another way to try (untested): import numpy import time chrommap = dict(chrY='y', chrX='x', chr13='c', chr12='b', chr11='a',

Re: CSV performance

2009-04-27 Thread Peter Otten
psaff...@googlemail.com wrote: > Thanks for your replies. Many apologies for not including the right > information first time around. More information is below. > > I have tried running it just on the csv read: > $ ./largefilespeedtest.py > working at file largefile.txt > finished: 3.86.2 >

Re: CSV performance

2009-04-27 Thread Tim Chase
I have tried running it just on the csv read: ... print "finished: %f.2" % (t1 - t0) I presume you wanted "%.2f" here. :) $ ./largefilespeedtest.py working at file largefile.txt finished: 3.86.2 So just the CSV processing of the file takes just shy of 4 seconds and you said that just

Re: CSV performance

2009-04-27 Thread Peter Otten
grocery_stocker wrote: > On Apr 27, 5:15 am, Peter Otten <__pete...@web.de> wrote: >> psaff...@googlemail.com wrote: >> > I'm using the CSV library to process a large amount of data - 28 >> > files, each of 130MB. Just reading in the data from one file and >> > filing it into very simple data stru

Re: CSV performance

2009-04-27 Thread grocery_stocker
On Apr 27, 5:15 am, Peter Otten <__pete...@web.de> wrote: > psaff...@googlemail.com wrote: > > I'm using the CSV library to process a large amount of data - 28 > > files, each of 130MB. Just reading in the data from one file and > > filing it into very simple data structures (numpy arrays and a > >

Re: CSV performance

2009-04-27 Thread psaff...@googlemail.com
Thanks for your replies. Many apologies for not including the right information first time around. More information is below. I have tried running it just on the csv read: import time import csv afile = "largefile.txt" t0 = time.clock() print "working at file", afile reader = csv.reader(open(a

Re: CSV performance

2009-04-27 Thread Tim Chase
I'm using the CSV library to process a large amount of data - 28 files, each of 130MB. Just reading in the data from one file and filing it into very simple data structures (numpy arrays and a cstringio) takes around 10 seconds. If I just slurp one file into a string, it only takes about a second,

Re: CSV performance

2009-04-27 Thread Peter Otten
psaff...@googlemail.com wrote: > I'm using the CSV library to process a large amount of data - 28 > files, each of 130MB. Just reading in the data from one file and > filing it into very simple data structures (numpy arrays and a > cstringio) takes around 10 seconds. If I just slurp one file into

Re: CSV performance

2009-04-27 Thread John Machin
On Apr 27, 9:22 pm, "psaff...@googlemail.com" wrote: > I'm using the CSV library to process a large amount of data - 28 > files, each of 130MB. Just reading in the data from one file and > filing it into very simple data structures (numpy arrays and a > cstringio) takes around 10 seconds. If I jus