>
> rows = fh.read().split()
> coords = numpy.array(map(int, rows[1::3]), dtype=int)
> points = numpy.array(map(float, rows[2::3]), dtype=float)
> chromio.writelines(map(chrommap.__getitem__, rows[::3]))
>
My original version is about 15 seconds. This version is about 9. The
chunks version posted
In message , Jorgen Grahn
wrote:
> I am asking because people who like databases tend to overestimate the
> time it takes to parse text.
And those of us who regularly load databases from text files, or unload them
in the opposite direction, have a good idea of EXACTLY how long it takes to
pars
On Mon, 27 Apr 2009 23:56:47 +0200, dean wrote:
> On Mon, 27 Apr 2009 04:22:24 -0700 (PDT), psaff...@googlemail.com wrote:
>
>> I'm using the CSV library to process a large amount of data - 28
>> files, each of 130MB. Just reading in the data from one file and
>> filing it into very simple data st
In message , Peter Otten wrote:
> When I see the sequence
>
> save state
> change state
> do something
> restore state
>
> I feel compelled to throw in a try ... finally
Yeah, but I try to avoid using exceptions to that extent. :)
--
http://mail.python.org/mailman/listinfo/python-list
Lawrence D'Oliveiro wrote:
> In message , Peter Otten wrote:
>
>> gc.disable()
>> # create many small objects that you want to keep
>> gc.enable()
>
> Every time I see something like this, I feel the urge to save the previous
> state and restore it afterwards:
>
> save_enabled = gc.isenable
In message , Peter Otten wrote:
> gc.disable()
> # create many small objects that you want to keep
> gc.enable()
Every time I see something like this, I feel the urge to save the previous
state and restore it afterwards:
save_enabled = gc.isenabled()
gc.disable()
# create many small
On Mon, 27 Apr 2009 04:22:24 -0700 (PDT), psaff...@googlemail.com wrote:
> I'm using the CSV library to process a large amount of data - 28
> files, each of 130MB. Just reading in the data from one file and
> filing it into very simple data structures (numpy arrays and a
> cstringio) takes around
psaff...@googlemail.com wrote:
Thanks for your replies. Many apologies for not including the right
information first time around. More information is below
Here is another way to try (untested):
import numpy
import time
chrommap = dict(chrY='y', chrX='x', chr13='c', chr12='b', chr11='a',
psaff...@googlemail.com wrote:
> Thanks for your replies. Many apologies for not including the right
> information first time around. More information is below.
>
> I have tried running it just on the csv read:
> $ ./largefilespeedtest.py
> working at file largefile.txt
> finished: 3.86.2
>
I have tried running it just on the csv read:
...
print "finished: %f.2" % (t1 - t0)
I presume you wanted "%.2f" here. :)
$ ./largefilespeedtest.py
working at file largefile.txt
finished: 3.86.2
So just the CSV processing of the file takes just shy of 4
seconds and you said that just
lines and set the
>> > variables?
>>
>> > Is there some way I can improve the CSV performance?
>>
>> My ideas:
>>
>> (1) Disable cyclic garbage collection while you read the file into your
>> data structure:
>>
>> import gc
>&g
a structures (numpy arrays and a
> > cstringio) takes around 10 seconds. If I just slurp one file into a
> > string, it only takes about a second, so I/O is not the bottleneck. Is
> > it really taking 9 seconds just to split the lines and set the
> > variables?
>
> >
Thanks for your replies. Many apologies for not including the right
information first time around. More information is below.
I have tried running it just on the csv read:
import time
import csv
afile = "largefile.txt"
t0 = time.clock()
print "working at file", afile
reader = csv.reader(open(a
I'm using the CSV library to process a large amount of data -
28 files, each of 130MB. Just reading in the data from one
file and filing it into very simple data structures (numpy
arrays and a cstringio) takes around 10 seconds. If I just
slurp one file into a string, it only takes about a second,
urp one file into a
> string, it only takes about a second, so I/O is not the bottleneck. Is
> it really taking 9 seconds just to split the lines and set the
> variables?
>
> Is there some way I can improve the CSV performance?
My ideas:
(1) Disable cyclic garbage collection w
t's a problem with the csv module and not with the "filing it
into very simple data structures"? How long does it take just to read
the CSV file i.e. without any setting the variables? Have you run your
timing tests multiple times and discarded the first 1 or two results?
> Is ther
econd, so I/O is not the bottleneck. Is
it really taking 9 seconds just to split the lines and set the
variables?
Is there some way I can improve the CSV performance? Is there a way I
can slurp the file into memory and read it like a file from there?
Peter
--
http://mail.python.org/mailman/lis
17 matches
Mail list logo