Jeffrey Kintscher <websur...@surf2c.net> added the comment:
I ran another C test where I called lseek(fd, 0, SEEK_CUR) in-between each of the stream I/O functions to see how it reports the buffered and kernel file positions: opening file with wb writing 3 bytes to file lseek: 0 ftell(f): expecting 3, got 3 lseek: 3 closing file opening file with a+b lseek: 3 ftell(f): expecting 3, got 3 lseek: 3 writing 3 bytes to file lseek: 3 ftell(f): expecting 6, got 6 lseek: 6 fseek(f, 0, SEEK_SET): expecting 0, got 0 lseek: 0 ftell(f): expecting 0, got 0 lseek: 0 writing 3 bytes to file lseek: 0 ftell(f): expecting 9, got 9 lseek: 9 writing 3 bytes to file lseek: 9 ftell(f): expecting 12, got 12 lseek: 12 fseek(f, 0, SEEK_CUR): expecting 0, got 0 lseek: 12 ftell(f): expecting 12, got 12 lseek: 12 fseek(f, 0, SEEK_SET): expecting 0, got 0 lseek: 0 fread(buf, 1, 256, f): expecting 12, got 12 lseek: 12 expecting 'abcdefghijkl', got 'abcdefghijkl' closing file removing file The C library fseek() and ftell() functions update the kernel file position, but fwrite() doesn't (because it hasn't overflowed its write buffer). The kernel doesn't allow seeking past the end of the file, so it looks like fseek() and ftell() are flushing the C library's write buffer because the subsequent lseek() calls return the same value. My observation of Python 3 and C library behavior with read/write append mode: Bufferred write: C - write to memory buffer, no kernel position updates unless the buffer is flushed to the kernel by buffer overflow Python - same as C Buffered seek: C - write buffer is flushed to the kernel to update the kernel position, then a kernel seek operation is performed and the new location reported by the kernel is returned Python - same as C Buffered tell: C - identical behavior as buffered seek; semantically a wrapper for fseek(f, 0, SEEK_CUR) Python - the position is calculated using the last queried kernel position and the dirty buffer length, the kernel position is queried only if a kernel position hasn't been previously queried since the file was opened Having the buffered I/O library keep track of the file position is attractive in that it saves time by skipping a few kernel calls, but, as demonstrated, it can easily get out of sync. I'm sure I can come up with a way to fix the accounting to work properly without querying the kernel, but I don't think that is a good solution. I see dangers with trying to track the file position in the library independent of the kernel. For one, when the file is opened with the append flag, the kernel changes the position to the end of the file with every write. Yes, that can be tracked with careful code in the library, except when the file is opened two or more times and written to using different file descriptors. This is a common scenario for log files, with appends being performed by multiple processes. The only mitigation is to use file locking to guarantee exclusive file access, but the semantics vary from system to system and it is not universally supported. My preferred solution is to follow the C library semantics and turn buffered tell() into a wrapper for buffered seek(0, io.SEEK_CUR). Please, let me know if there are some semantics I haven't considered. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue36411> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com