Jeffrey Kintscher <websur...@surf2c.net> added the comment:

I ran another C test where I called lseek(fd, 0, SEEK_CUR) in-between each of 
the stream I/O functions to see how it reports the buffered and kernel file 
positions:

opening file with wb
writing 3 bytes to file
lseek: 0
ftell(f): expecting 3, got 3
lseek: 3
closing file
opening file with a+b
lseek: 3
ftell(f): expecting 3, got 3
lseek: 3
writing 3 bytes to file
lseek: 3
ftell(f): expecting 6, got 6
lseek: 6
fseek(f, 0, SEEK_SET): expecting 0, got 0
lseek: 0
ftell(f): expecting 0, got 0
lseek: 0
writing 3 bytes to file
lseek: 0
ftell(f): expecting 9, got 9
lseek: 9
writing 3 bytes to file
lseek: 9
ftell(f): expecting 12, got 12
lseek: 12
fseek(f, 0, SEEK_CUR): expecting 0, got 0
lseek: 12
ftell(f): expecting 12, got 12
lseek: 12
fseek(f, 0, SEEK_SET): expecting 0, got 0
lseek: 0
fread(buf, 1, 256, f): expecting 12, got 12
lseek: 12
expecting 'abcdefghijkl', got 'abcdefghijkl'
closing file
removing file

The C library fseek() and ftell() functions update the kernel file position, 
but fwrite() doesn't (because it hasn't overflowed its write buffer). The 
kernel doesn't allow seeking past the end of the file, so it looks like fseek() 
and ftell() are flushing the C library's write buffer because the subsequent 
lseek() calls return the same value.

My observation of Python 3 and C library behavior with read/write append mode:

Bufferred write:
C - write to memory buffer, no kernel position updates unless the buffer is 
flushed to the kernel by buffer overflow
Python - same as C

Buffered seek:
C - write buffer is flushed to the kernel to update the kernel position, then a 
kernel seek operation is performed and the new location reported by the kernel 
is returned
Python - same as C

Buffered tell:
C - identical behavior as buffered seek; semantically a wrapper for fseek(f, 0, 
SEEK_CUR)
Python - the position is calculated using the last queried kernel position and 
the dirty buffer length, the kernel position is queried only if a kernel 
position hasn't been previously queried since the file was opened


Having the buffered I/O library keep track of the file position is attractive 
in that it saves time by skipping a few kernel calls, but, as demonstrated, it 
can easily get out of sync. I'm sure I can come up with a way to fix the 
accounting to work properly without querying the kernel, but I don't think that 
is a good solution.

I see dangers with trying to track the file position in the library independent 
of the kernel. For one, when the file is opened with the append flag, the 
kernel changes the position to the end of the file with every write. Yes, that 
can be tracked with careful code in the library, except when the file is opened 
two or more times and written to using different file descriptors. This is a 
common scenario for log files, with appends being performed by multiple 
processes. The only mitigation is to use file locking to guarantee exclusive 
file access, but the semantics vary from system to system and it is not 
universally supported.

My preferred solution is to follow the C library semantics and turn buffered 
tell() into a wrapper for buffered seek(0, io.SEEK_CUR). Please, let me know if 
there are some semantics I haven't considered.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue36411>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to