Re: Possible read()/readline() bug?

Terry Reedy Wed, 22 Oct 2008 23:15:43 -0700

Steven D'Aprano wrote:

On Wed, 22 Oct 2008 16:59:45 -0400, Terry Reedy wrote:

Mike Kent wrote:

Before I file a bug report against Python 2.5.2, I want to run this by
the newsgroup to make sure I'm not [missing something].

Good idea ;-).  What you are missing is a rereading of the fine manual
to see what you missed the first time.  I recommend this *whenever* you
are having a vexing problem.
With respect Terry, I think what you have missed is the reason why the OPthinks this is a bug.

I think not. I read and responded carefully ;-) I stand by my answer:the OP should read the doc and try buffer=0 to see if that solves hisproblem.

He's not surprised that buffering is going on:

"This indicates some sort of buffering and caching is going on."

If one reads the open() doc section on buffering, one will *know* thatthe reading is buffered and that this is very intentional, and that onecan turn it off.

but he thinks that the buffering should be discarded when you seek:

"It seems pretty clear to me that this is wrong.  If there is any
caching going on, it should clearly be discarded if I do a seek.

I don't think Python has any control over this, certainly not in aplatform independent way, and not after the file has been open.

For normal sane file reading, discarding after every seek would be verywrong. Buffering is an *optional* efficiency measure which normally isthe right thing to do and so is the default but which can be disabledwhen it is not IF ONE READS THE DOC.

Note
that it's not just readline() that's returning me the wrong, cached
data, as I've also tried this with read(), and I get the same
results.  It's not acceptable that I have to close and reopen the file
before every read when I'm doing random record access."


And he does not have to do such a thing.

I think Mike has a point: if a cache is out of sync with the actual data,then the cache needs to be thrown away. A bad cache is worse than nocache at all.

Right. I told him what to try. If *that* does not work, he can reportback.


Python is not doing the caching.  This is OS stuff.

Surely dealing with files that are being actively changed by otherprocesses is hard.

Tail, which sequentially reads what a other process(es) sequentiallywrite, works fine.

I'm not sure that the solution is anything other than"well, don't do that then".

Mixed random access is a different matter. There is a reason DBMSes runfile access through one process.

How do other programming languages and Unixtools behave? (Windows generally only allows a single process to read orwrite to a file at once.)
Additionally, I wonder whether what Mike is seeing is some side-effect offile-system caching. Perhaps the bytes written to the file by echo areonly written to disk when the file is closed? I don't know, I'm justhypothesizing.

When echo closes, I expect the disk block will be flushed, which meansadded to the pool of blocks ready to be read or written when the diskdriver gets cpu time and gets around to any particular block. Dependingof the file system and driver, blocks may get sorted by disk address tominimize inter-access seek times (the elevator algorithm).


Terry Jan Reedy


--
http://mail.python.org/mailman/listinfo/python-list

Re: Possible read()/readline() bug?

Reply via email to