"Paul Watson" <[EMAIL PROTECTED]> writes: > Here is a better one that counts, and not just detects, the substring. This > is -much- faster than using mmap; especially for a large file that may cause > paging to start. Using mmap can be -very- slow. > > #!/usr/bin/env python > import sys > > fn = 't2.dat' > ss = '\x00\x00\x01\x00' > > be = len(ss) - 1 # length of overlap to check > blocksize = 64 * 1024 # need to ensure that blocksize > overlap > > fp = open(fn, 'rb') > b = fp.read(blocksize) > count = 0 > while len(b) > be: > count += b.count(ss) > b = b[-be:] + fp.read(blocksize) > fp.close() > > print count > sys.exit(0) > >
Did you do timings on it vs. mmap? Having to copy the data multiple times to deal with the overlap - thanks to strings being immutable - would seem to be a lose, and makes me wonder how it could be faster than mmap in general. Thanks, <mike -- Mike Meyer <[EMAIL PROTECTED]> http://www.mired.org/home/mwm/ Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information. -- http://mail.python.org/mailman/listinfo/python-list