Steven D'Aprano wrote:
On Wed, 22 Oct 2008 16:59:45 -0400, Terry Reedy wrote:
Mike Kent wrote:
Before I file a bug report against Python 2.5.2, I want to run this by
the newsgroup to make sure I'm not [missing something].
Good idea ;-). What you are missing is a rereading of the fine manual
to see what you missed the first time. I recommend this *whenever* you
are having a vexing problem.
With respect Terry, I think what you have missed is the reason why the OP
thinks this is a bug.
I think not. I read and responded carefully ;-) I stand by my answer:
the OP should read the doc and try buffer=0 to see if that solves his
problem.
He's not surprised that buffering is going on:
"This indicates some sort of buffering and caching is going on."
If one reads the open() doc section on buffering, one will *know* that
the reading is buffered and that this is very intentional, and that one
can turn it off.
but he thinks that the buffering should be discarded when you seek:
"It seems pretty clear to me that this is wrong. If there is any
caching going on, it should clearly be discarded if I do a seek.
I don't think Python has any control over this, certainly not in a
platform independent way, and not after the file has been open.
For normal sane file reading, discarding after every seek would be very
wrong. Buffering is an *optional* efficiency measure which normally is
the right thing to do and so is the default but which can be disabled
when it is not IF ONE READS THE DOC.
Note
that it's not just readline() that's returning me the wrong, cached
data, as I've also tried this with read(), and I get the same
results. It's not acceptable that I have to close and reopen the file
before every read when I'm doing random record access."
And he does not have to do such a thing.
I think Mike has a point: if a cache is out of sync with the actual data,
then the cache needs to be thrown away. A bad cache is worse than no
cache at all.
Right. I told him what to try. If *that* does not work, he can report
back.
Python is not doing the caching. This is OS stuff.
Surely dealing with files that are being actively changed by other
processes is hard.
Tail, which sequentially reads what a other process(es) sequentially
write, works fine.
I'm not sure that the solution is anything other than
"well, don't do that then".
Mixed random access is a different matter. There is a reason DBMSes run
file access through one process.
How do other programming languages and Unix
tools behave? (Windows generally only allows a single process to read or
write to a file at once.)
Additionally, I wonder whether what Mike is seeing is some side-effect of
file-system caching. Perhaps the bytes written to the file by echo are
only written to disk when the file is closed? I don't know, I'm just
hypothesizing.
When echo closes, I expect the disk block will be flushed, which means
added to the pool of blocks ready to be read or written when the disk
driver gets cpu time and gets around to any particular block. Depending
of the file system and driver, blocks may get sorted by disk address to
minimize inter-access seek times (the elevator algorithm).
Terry Jan Reedy
--
http://mail.python.org/mailman/listinfo/python-list