Martin v. Löwis added the comment: "4. Many Internet standards are defined in terms of textual data"
I believe the author was thinking of the "old" TCP-based protocols (ftp, smtp, RFC 822, HTTP), which have their commands/messages as ASCII-strings, with a variable-length records (often terminated by line end). I think bringing this up as an argument against UTF-32 somewhat flawed, for two reasons: 1. Historically, many of these protocols restricted themselves to pure ASCII, so using UTF-8 is as much a protocol violation as is using UTF-32. 2. The tricky part in this protocols is often not the risk of embedding NUL, but embedding CRLF (as 0D 0A might well appear in a character, a.g. MALAYALAM LETTER UU) OTOH, it is a fact that several of these protocols got revised to support Unicode, and often re-interpreting the data as UTF-8 (with MIME being the notable exception that actually allows for UTF-32 on the wire if somebody choses to). ---------- nosy: +loewis _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue20906> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com