Hi, 1) Does this make any sense:
""" Thus, the loop: for line in f: iterates on each line of the file. Due to buffering issues, interrupting such a loop prematurely(e.g. with break), or calling f.next() instead of f.readline(), leaves the files position set to an arbitrary value. """ The docs say: """ next( ) A file object is its own iterator, for example iter(f) returns f (unless f is closed). When a file is used as an iterator, typically in a for loop (for example, for line in f: print line), the next() method is called repeatedly. This method returns the next input line, or raises StopIteration when EOF is hit. In order to make a for loop the most efficient way of looping over the lines of a file (a very common operation), the next() method uses a hidden read-ahead buffer. As a consequence of using a read-ahead buffer, combining next() with other file methods (like readline()) does not work right. However, using seek() to reposition the file to an absolute position will flush the read-ahead buffer. New in version 2.3. "" I experimented with this test code: f = open("aaa.txt", "w") for i in range(1000): f.write("line " + str(i) + "\n") f.close() f = open("aaa.txt", "r") for line in f: print f.next() print f.readline() break print f.next() print f.readline() f.close() and the output was: line 1 922 line 2 line 923 So, it looks like f.readline() is what messes things up--not f.next(). "for line in f" appears to be reading a chunk of the file into a buffer, and then readline() gets the next line after the chunk. 2) Does f.readline() provide any buffering? It doesn't look like it when I run this code and examine the output: f = open("aaa.txt", "w") for i in range(3000): f.write("line " + str(i) + "\n") f.close() f = open("aaa.txt", "r") for line in f: print f.next() print f.readline() f.close() The first few lines of the the output are: line 1 922 line 3 line 923 line 5 line 924 (I assume the skipping from 1 to 3 to 5 is caused by the automatic call to f.next() when the for loop begins in addition to the explicit call to f.next() inside the loop.) I interpret the output to mean that the chunk of the file put in the buffer by "for line in f", ends in the middle of line 922, and "print f.readline()" is printing the first line past the buffer. Scrolling down to where "print f.next()" reaches line 922, I see this: line 919 line 1381 line 921 line 1382 line 1384 <-----** line 2407 <-----** which means that when the buffer that was created by "for line in f" is empty, the next chunk starting directly after the current position of the readline() file position is put in the buffer. That indicates that readline() provides no buffering. Then, the next call to readline() jumps to a position after the chunk that was used to replenish the buffer. -- http://mail.python.org/mailman/listinfo/python-list