Alexandre Ferrieux <[EMAIL PROTECTED]> wrote: > On Jul 23, 9:36 am, Paul Rubin <http://[EMAIL PROTECTED]> wrote: >> Alexandre Ferrieux <[EMAIL PROTECTED]> writes: >> > So I'll reiterate the question: *why* does the Python library add that >> > extra layer of (hard-headed) buffering on top of stdio's ? >> >> readline? > > I know readline() doesn't have this problem. I'm asking why the file > iterator does. > Here's a program which can create a large file and either read it with readline or iterate over the lines. Output from various runs should answer your question.
The extra buffering means that iterating over a file is about 3 times faster than repeatedly calling readline. C:\Temp>test.py create 1000000 create file Time taken=7.28 seconds C:\Temp>test.py readline readline Time taken=1.03 seconds C:\Temp>test.py iterate iterate Time taken=0.38 seconds C:\Temp>test.py create 10000000 create file Time taken=47.28 seconds C:\Temp>test.py readline readline Time taken=10.39 seconds C:\Temp>test.py iterate iterate Time taken=3.58 seconds ------- test.py ------------ import time, sys NLINES = 10 def create(): print "create file" f = open('testfile.txt', 'w') for i in range(NLINES): print >>f, "This is a test file with a lot of lines" f.close() def readline(): print "readline" f = open('testfile.txt', 'r') while 1: line = f.readline() if not line: break f.close() def iterate(): print "iterate" f = open('testfile.txt', 'r') for line in f: pass f.close() def doit(fn): start = time.time() fn() end = time.time() print "Time taken=%0.2f seconds" % (end-start) if __name__=='__main__': if len(sys.argv) >= 3: NLINES = int(sys.argv[2]) if sys.argv[1]=='create': doit(create) elif sys.argv[1]=='readline': doit(readline) elif sys.argv[1]=='iterate': doit(iterate) ---------------------------- -- http://mail.python.org/mailman/listinfo/python-list