David Watson <bai...@users.sourceforge.net> added the comment: > > In fact, I would think that non-ASCII bytes in a hostname most > > probably indicated that a name resolution mechanism other than > > the DNS was in use, and that the byte string should be passed > > unaltered just as a typical C program would. > > I'm not talking about byte strings, but character strings.
I mean that passing the str object from socket.gethostname() to the Python lookup function ought to result in the same byte string being passed to the C lookup function as was returned by the C gethostname() function (or else that the programmer must re-encode the str to ensure that that result is obtained). > > I don't object to that, but it does force a choice between > > decoding an 8-bit name for display (e.g. by using the locale > > encoding), and decoding it to round-trip automatically (e.g. by > > using ASCII/surrogateescape, with support on the encoding side). > > In the face of ambiguity, refuse the temptation to guess. Yes, I would interpret that to mean not using the locale encoding for data obtained from the network. That's another reason why the ASCII/surrogateescape scheme appeals to me more. > Well, Python is not C. In Python, you would pass a str, and > expect it to work, which means it will get automatically encoded > with IDNA. I think there might be a misunderstanding here - I've never proposed changing the interpretation of Unicode characters in hostname arguments. The ASCII/surrogateescape scheme I suggested only changes the interpretation of unpaired surrogate codes, as they do not occur in IDNs or any other genuine Unicode data; all IDNs, including those solely consisting of ASCII characters, would be encoded to the same byte sequence as before. ASCII/surrogateescape decoding could also be used without support on the encoding side - that would satisfy the requirement to "refuse the temptation to guess", would allow the original bytes to be recovered, and would mean that attempting to look up a non-ASCII result in str form would raise an exception rather than looking up the wrong name. > Marc-Andre wants gethostname to use the Wide API on Windows, which, > in theory, allows for cases where round-tripping to bytes is > impossible. Well, the name resolution APIs wrapped by Python are all byte-oriented, so if the computer name were to have no bytes equivalent then it wouldn't be possible to resolve it anyway, and an exception rightly ought be raised at some point in the process of trying to do so. ---------- title: socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names -> socket, PEP 383: Mishandling of non-ASCII bytes in host/domain names _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue9377> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com