In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] wrote: > Steve Holden wrote: > > Indeed, but reading one byte at a time is about the slowest way to > > process a file, in Python or any other language, because it fails to > > amortize the overhead cost of function calls over many characters. > > > > Buffering wasn't invented because early programmers had nothing better > > to occupy their minds, remember :-) > > Buffer, and then read one byte at a time from the buffer.
Have you mesured it? #!/usr/bin/python '''Time some file scanning. ''' import sys, time f = open(sys.argv[1]) t = time.time() while True: b = f.read(256*1024) if not b: break print 'initial read', time.time() - t f.close() f = open(sys.argv[1]) t = time.time() while True: b = f.read(256*1024) if not b: break print 'second read', time.time() - t f.close() if 1: f = open(sys.argv[1]) t = time.time() while True: b = f.read(256*1024) if not b: break for c in b: pass print 'third chars', time.time() - t f.close() f = open(sys.argv[1]) t = time.time() n = 0 srch = '\x00\x00\x01\x00' laplen = len(srch)-1 lap = '' while True: b = f.read(256*1024) if not b: break n += (lap+b[:laplen]).count(srch) n += b.count(srch) lap = b[-laplen:] print 'fourth scan', time.time() - t, n f.close() On my (old) system, with a 512 MB file so it won't all buffer, the second time I get: initial read 14.513395071 second read 14.8771388531 third chars 178.250257969 fourth scan 26.1602909565 1 ________________________________________________________________________ TonyN.:' [EMAIL PROTECTED] ' <http://www.georgeanelson.com/> -- http://mail.python.org/mailman/listinfo/python-list