Martin v. Löwis <mar...@v.loewis.de> added the comment:

> I would have thought that someone who intended a Unicode hostname
> to be looked up in its IDNA form would have encoded it using
> IDNA, rather than an 8-bit encoding - how many C programs would
> transcode the name that way, rather than just passing the char *
> from one interface to another?

Well, Python is not C. In Python, you would pass a str, and
expect it to work, which means it will get automatically encoded
with IDNA.

> In fact, I would think that non-ASCII bytes in a hostname most
> probably indicated that a name resolution mechanism other than
> the DNS was in use, and that the byte string should be passed
> unaltered just as a typical C program would.

I'm not talking about byte strings, but character strings.

> I don't object to that, but it does force a choice between
> decoding an 8-bit name for display (e.g. by using the locale
> encoding), and decoding it to round-trip automatically (e.g. by
> using ASCII/surrogateescape, with support on the encoding side).

In the face of ambiguity, refuse the temptation to guess.

> So overall, I do think it is better to decode names for automatic
> round-tripping rather than for display, but my main concern is
> simply that it should be possible to recover the original bytes
> so that round-tripping is at least possible.

Marc-Andre wants gethostname to use the Wide API on Windows, which,
in theory, allows for cases where round-tripping to bytes is
impossible.

----------
title: socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names -> 
socket,  PEP 383: Mishandling of non-ASCII bytes in host/domain names

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue9377>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to