Re: Scanning a file

Alex Martelli Fri, 28 Oct 2005 20:05:46 -0700

Mike Meyer <[EMAIL PROTECTED]> wrote:
   ...
> Except if you can't read the file into memory because it's to large,
> there's a pretty good chance you won't be able to mmap it either.  To
> deal with huge files, the only option is to read the file in in
> chunks, count the occurences in each chunk, and then do some fiddling
> to deal with the pattern landing on a boundary.


That's the kind of things generators are for...:

def byblocks(f, blocksize, overlap):
    block = f.read(blocksize)
    yield block
    while block:
        block = block[-overlap:] + f.read(blocksize-overlap)
        if block: yield block

Now, to look for a substring of length N in an open binary file f:

f = open(whatever, 'b')
count = 0
for block in byblocks(f, 1024*1024, len(subst)-1):
    count += block.count(subst)
f.close()

not much "fiddling" needed, as you can see, and what little "fiddling"
is needed is entirely encompassed by the generator...


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Scanning a file

Reply via email to