[issue20906] Issues in Unicode HOWTO

Martin v . Löwis Sat, 22 Mar 2014 05:23:27 -0700

Martin v. Löwis added the comment:

"4. Many Internet standards are defined in terms of textual data"


I believe the author was thinking of the "old" TCP-based protocols (ftp, smtp, 
RFC 822, HTTP), which have their commands/messages as ASCII-strings,  with a 
variable-length records (often terminated by line end).

I think bringing this up as an argument against UTF-32 somewhat flawed, for two 
reasons:
1. Historically, many of these protocols restricted themselves to pure ASCII, 
so using UTF-8 is as much a protocol violation as is using UTF-32.
2. The tricky part in this protocols is often not the risk of embedding NUL, 
but embedding CRLF (as 0D 0A might well appear in a character, a.g. MALAYALAM 
LETTER UU)

OTOH, it is a fact that several of these protocols got revised to support 
Unicode, and often re-interpreting the data as UTF-8 (with MIME being the 
notable exception that actually allows for UTF-32 on the wire if somebody 
choses to).

----------
nosy: +loewis

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue20906>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue20906] Issues in Unicode HOWTO

Reply via email to