On Mon, Jan 25, 2021, 4:25 AM Steven D'Aprano <[email protected]> wrote:

> On Sun, Jan 24, 2021 at 10:43:54PM -0500, Matt Wozniski wrote:
> > And
> > `f.read(1)` needs to pick one of those and return it immediately. It
> can't
> > wait for more information. The contract of `read` is "Read from
> underlying
> > buffer until we have n characters or we hit EOF."
>
> In text mode, reads are always buffered:
>
> https://docs.python.org/3/library/functions.html#open
>
> so `f.read(1)` will read as much as needed, so long as it only returns a
> single character.
>

Text mode files are always backed by a buffer, yes, but that's not
relevant. My point is that `f.read(1)` must immediately return a character
if one exists in the buffer. It can't wait for more data to get buffered if
there is already a buffered character, as that would be a backwards
incompatible change that would badly break line based protocols like FTP,
SMTP, and POP.

Up until now, `f.read(1)` has always read bytes from the underlying file
descriptor into the buffer until it has one full character, and immediately
returned it. And this is user facing behavior. Imagine an echo server that
reads 1 character at a time and echoes it back, forever. The client will
only ever send 1 character at a time, so if an eight bit locale encoding is
in use the client will only send one byte before waiting for a response. As
things stand today this works. If encoding detection were added and the
server's call to `f.read(1)` could decide it doesn't know how to decode the
first byte it gets and to block until more data comes in, that would be a
deadlock, since the client isn't sending more.

A typical buffer size is 4096 bytes, or more.


Sure, but that doesn't mean that much data is always available. If
something has written less than that, it's not reasonable to block until
more data can be buffered in places where up until now no blocking would
have occurred. Not least because no more data will necessarily ever come.

And if it were to instead make its decisions based on what has been
buffered already, without ever blocking, then the behavior becomes
nondeterministic: it could return a different character based on how much
data the OS returned in the first read syscall.

In any case, I believe the intention of this proposal is for *open*, not
> read, to perform the detection.


If that's the case, named pipes are a perfect example of why that's
impossible. It's perfectly normal to open a named pipe that contains no
data, and that won't until you trigger some action (say, spawning a child
process that will write to it). You can't auto detect the encoding of an
empty pipe, and you can't make open block until data arrives because it's
entirely possible data will never arrive if open blocks.
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/GUL5VOYGDEE3MSC2KDWZ7RNDP2ZMJGAS/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to