Bengt Richter wrote: > On Fri, 28 Oct 2005 20:03:17 -0700, [EMAIL PROTECTED] (Alex Martelli) > wrote: > >>Mike Meyer <[EMAIL PROTECTED]> wrote: >> ... >>> Except if you can't read the file into memory because it's to large, >>> there's a pretty good chance you won't be able to mmap it either. To >>> deal with huge files, the only option is to read the file in in >>> chunks, count the occurences in each chunk, and then do some fiddling >>> to deal with the pattern landing on a boundary. >> >>That's the kind of things generators are for...: >> >>def byblocks(f, blocksize, overlap): >> block = f.read(blocksize) >> yield block >> while block: >> block = block[-overlap:] + f.read(blocksize-overlap) >> if block: yield block >> >>Now, to look for a substring of length N in an open binary file f: >> >>f = open(whatever, 'b') >>count = 0 >>for block in byblocks(f, 1024*1024, len(subst)-1): >> count += block.count(subst) >>f.close() >> >>not much "fiddling" needed, as you can see, and what little "fiddling" >>is needed is entirely encompassed by the generator... >> > Do I get a job at google if I find something wrong with the above? ;-)
Try it with a subst of length 1. Seems like you missed an opportunity :-) Peter -- http://mail.python.org/mailman/listinfo/python-list