On 13/05/2014 17:08, Ian Kelly wrote:
.........
And since it's so simple, it shouldn't be hard to see that the use of
the shutil module has nothing to do with the Unicode woes here. The
crux of the issue is that a general-purpose command like cat typically
can't know the encoding of its input and can't assume anything about
it. In fact, there may not even be an encoding; cat can be used with
binary data. The only non-destructive approach then is to copy the
binary data straight from the source to the destination with no
decoding steps at all, and trust the user to ensure that the
destination will be able to accommodate the source encoding. Because
Python 3 presents stdin and stdout as text streams however, it makes
them more difficult to use with binary data, which is why Armin sets
up all that extra code to make sure his file objects are binary.
Doesn't this issue also come up wherever bytes are being read ie in sockets,
pipe file handles etc? Some sources may have well defined encodings and so allow
use of unicode strings but surely not all. I imagine all of the problems
associated with a broken encoding promise for stdin can also occur with sockets
& other sources ie error messages failing to be printable etc etc. Since bytes
in Python 3 are not equivalent to the old str (Python 3 bytes != Python 2 str)
using bytes everywhere has its own problems.
--
Robin Becker
--
https://mail.python.org/mailman/listinfo/python-list