Mike Meyer <[EMAIL PROTECTED]> wrote: ... > Except if you can't read the file into memory because it's to large, > there's a pretty good chance you won't be able to mmap it either. To > deal with huge files, the only option is to read the file in in > chunks, count the occurences in each chunk, and then do some fiddling > to deal with the pattern landing on a boundary.
That's the kind of things generators are for...: def byblocks(f, blocksize, overlap): block = f.read(blocksize) yield block while block: block = block[-overlap:] + f.read(blocksize-overlap) if block: yield block Now, to look for a substring of length N in an open binary file f: f = open(whatever, 'b') count = 0 for block in byblocks(f, 1024*1024, len(subst)-1): count += block.count(subst) f.close() not much "fiddling" needed, as you can see, and what little "fiddling" is needed is entirely encompassed by the generator... Alex -- http://mail.python.org/mailman/listinfo/python-list