Martin v. Löwis <mar...@v.loewis.de> added the comment: > I would have thought that someone who intended a Unicode hostname > to be looked up in its IDNA form would have encoded it using > IDNA, rather than an 8-bit encoding - how many C programs would > transcode the name that way, rather than just passing the char * > from one interface to another?
Well, Python is not C. In Python, you would pass a str, and expect it to work, which means it will get automatically encoded with IDNA. > In fact, I would think that non-ASCII bytes in a hostname most > probably indicated that a name resolution mechanism other than > the DNS was in use, and that the byte string should be passed > unaltered just as a typical C program would. I'm not talking about byte strings, but character strings. > I don't object to that, but it does force a choice between > decoding an 8-bit name for display (e.g. by using the locale > encoding), and decoding it to round-trip automatically (e.g. by > using ASCII/surrogateescape, with support on the encoding side). In the face of ambiguity, refuse the temptation to guess. > So overall, I do think it is better to decode names for automatic > round-tripping rather than for display, but my main concern is > simply that it should be possible to recover the original bytes > so that round-tripping is at least possible. Marc-Andre wants gethostname to use the Wide API on Windows, which, in theory, allows for cases where round-tripping to bytes is impossible. ---------- title: socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names -> socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue9377> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com