On Sun, Jun 12, 2016, at 22:22, Steven D'Aprano wrote: > That's because base64 is a bytes-to-bytes transformation. It has > nothing to do with unicode encodings.
Nonsense. base64 is a binary-to-text encoding scheme. The output range is specifically chosen to be safe to transmit in text protocols. > > That is, the b64_encoded_data variable is of type 'bytes' and when > > you peek inside it's a string (made up of what seems to be only > > characters that exist in Base 64). > > If you print or otherwise display bytes, for the convenience of human > beings, those bytes are displayed as if they were ASCII. E.g. the byte > 0x61 is displayed as 'a'. Good idea? Bad idea? I can see arguments > either way, but that's how it is. There's absolutely no rational basis for choosing "0x41-0x5A, 0x61-0x7A, 0x30-0x39, 0x2B, 0x2F" as the output range except for what characters those values represent in ASCII. And if you needed to smuggle some binary data through an EBCDIC system in the same manner, you would naturally wish to convert it to the EBCDIC bytes corresponding to those same characters. -- https://mail.python.org/mailman/listinfo/python-list