On Mon, Jun 13, 2016, at 06:35, Steven D'Aprano wrote: > But this is a Python forum, and Python 3 is a language that tries > very, very hard to keep a clean separation between bytes and text,
Yes, but that doesn't mean that you're right about which side of that divide base64 output belongs on. > where text is understood to mean Unicode, not a subset of ASCII- > encoded bytes. Sure. But let's not pretend that U+0020 through U+007E *aren't* unicode characters. Base 64's output is characters. Those characters could be encoded as ASCII, as UTF-32, as EBCDIC, and they would still be the same characters. At http://pubs.opengroup.org/onlinepubs/9699919799/utilities/uuencode.html you can see in the rationale section a specific mention of using base64 with EBCDIC, and that the characters are all invariant across all EBCDIC encodings being part of the reason for base64 using the characters it does (as opposed to the historical uuencode algorithm's 0x20 through 0x5F, or as opposed to using some other non-alphanumeric characters than + / =) The fact that many historical standards do mix text with ASCII-encoded bytes and treat them interchangeably, as you said, does that you have to read carefully to see which one they mean. The problem with your argument, though, is that in base64's case it clearly *is* text. For example, from the original privacy-enhanced mail standards - the very first application of base64: RFC 989: "1. (Local_Form) The message text is created (e.g., via an editor) in the system's native character set, with lines delimited in accordance with local convention." RFC 1421: "A plaintext message is accepted in local form, using the host's native character set and line representation." And specifically in its description of base64 ("printable encoding"): "Proceeding from left to right, the bit string resulting from step 3 is encoded into characters which are universally representable at all sites, though not necessarily with the same bit patterns (e.g., although the character "E" is represented in an ASCII-based system as hexadecimal 45 and as hexadecimal C5 in an EBCDIC-based system, the local significance of the two representations is equivalent)." -- https://mail.python.org/mailman/listinfo/python-list