Re: "Faster" I/O in a script

Kris Kennaway Wed, 04 Jun 2008 04:59:49 -0700

Gary Herron wrote:

[EMAIL PROTECTED] wrote:

On Jun 2, 2:08 am, "kalakouentin" <[EMAIL PROTECTED]> wrote:

 Do you know a way to actually load my data in a more
"batch-like" way so I will avoid the constant line by line reading?


If your files will fit in memory, you can just do

text = file.readlines()

and Python will read the entire file into a list of strings named
'text,' where each item in the list corresponds to one 'line' of the
file.

No that won't help. That has to do *all* the same work (reading blocksand finding line endings) as the iterator PLUS allocate and build a list.

Better to just use the iterator.

for line in file:
 ...

Actually this *can* be much slower. Suppose I want to search a file tosee if a substring is present.


st = "some substring that is not actually in the file"
f = <50 MB log file>

Method 1:

for i in file(f):
    if st in i:
        break

--> 0.472416 seconds

Method 2:

Read whole file:

fh = file(f)
rl = fh.read()
fh.close()

--> 0.098834 seconds

"st in rl" test --> 0.037251 (total: .136 seconds)

Method 3:

mmap the file:

mm = mmap.mmap(fh.fileno(), 0, mmap.MAP_SHARED, mmap.PROT_READ)
"st in mm" test --> 3.589938 (<-- see my post the other day)

mm.find(st) --> 0.186895

Summary:

If you can afford the memory, it can be more efficient (more than 3times faster in this example) to read the file into memory and processit at once (if possible).

Mmapping the file and processing it at once is roughly as fast (I didntmeasure the difference carefully), but has the advantage that if thereare parts of the file you do not touch you don't fault them into memory.You could also play more games and mmap chunks at a time to limit thememory use (but you'd have to be careful with mmapping that doesn'tmatch record boundaries).


Kris
--
http://mail.python.org/mailman/listinfo/python-list

Re: "Faster" I/O in a script

Reply via email to