STINNER Victor added the comment: > The default encoding in the C/POSIX locale is ASCII (which is the entire > source of the problem).
The reality is more complex than that :-) It depends on the OS. Some OS uses Latin1 for the POSIX locale. Some OS announces to use Latin1 for the POSIX locale, but use ASCII in practice :-) On these lying OS, Python decodes bytes 0x80..0xff using mbstowcs() to check if we get ASCII or Latin1: see the check_force_ascii() function. /* Workaround FreeBSD and OpenIndiana locale encoding issue with the C locale. On these operating systems, nl_langinfo(CODESET) announces an alias of the ASCII encoding, whereas mbstowcs() and wcstombs() functions use the ISO-8859-1 encoding. The problem is that os.fsencode() and os.fsdecode() use locale.getpreferredencoding() codec. For example, if command line arguments are decoded by mbstowcs() and encoded back by os.fsencode(), we get a UnicodeEncodeError instead of retrieving the original byte string. The workaround is enabled if setlocale(LC_CTYPE, NULL) returns "C", nl_langinfo(CODESET) announces "ascii" (or an alias to ASCII), and at least one byte in range 0x80-0xff can be decoded from the locale encoding. The workaround is also enabled on error, for example if getting the locale failed. (...) */ ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue28180> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com