Re: Scanning a file

Mike Meyer Fri, 28 Oct 2005 22:45:43 -0700

"Paul Watson" <[EMAIL PROTECTED]> writes:
> Here is a better one that counts, and not just detects, the substring.  This 
> is -much- faster than using mmap; especially for a large file that may cause 
> paging to start.  Using mmap can be -very- slow.
>
> #!/usr/bin/env python
> import sys
>
> fn = 't2.dat'
> ss = '\x00\x00\x01\x00'
>
> be = len(ss) - 1        # length of overlap to check
> blocksize = 64 * 1024    # need to ensure that blocksize > overlap
>
> fp = open(fn, 'rb')
> b = fp.read(blocksize)
> count = 0
> while len(b) > be:
>     count += b.count(ss)
>     b = b[-be:] + fp.read(blocksize)
> fp.close()
>
> print count
> sys.exit(0) 
>
>


Did you do timings on it vs. mmap? Having to copy the data multiple
times to deal with the overlap - thanks to strings being immutable -
would seem to be a lose, and makes me wonder how it could be faster
than mmap in general.

     Thanks,
     <mike
-- 
Mike Meyer <[EMAIL PROTECTED]>                  http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Scanning a file

Reply via email to