STINNER Victor <vstin...@python.org> added the comment:
I created this issue while reviewing the implementation of the PEP 597: PR 19481. Copy of my comments on the PR related to this issue. _locale.get_locale_encoding() calls _Py_GetLocaleEncoding() which returns UTF-8 if the Python UTF-8 Mode is enabled. Maybe the function could have a flag: please don't lie to me and return the current locale encoding ;-) Or we could add a function to get the *current* locale encoding: **locale.get_current_locale_encoding()**. This one would ignore the UTF-8 Mode and call nl_langinfo(CODESET). There are APIs to use the *current* locale encoding: PyUnicode_EncodeLocale/PyUnicode_DecodeLocale and _Py_EncodeLocaleEx/_Py_DecodeLocaleEx with current_locale=1. You can see which functions use it: * decode tm_zone field of localtime_r() and gmtime() * decode tzname[0] and tzname[1] strings * decode setlocale() result * decode some localeconv() fields (this function requires to switch to different locale encoding, it's bad!) * decode nl_langinfo() result * decode gettext(), dgettext(), dcgettext(), textdomain(), bindtextdomain(), bind_textdomain_codeset() result * decode strerror() and dlerror() result * encode/decode in the readline module * encode format string for strftime() in time.strftime() (only used on Windows, Unix provides wcsftime) and then decode strftime() result > encoding="locale" : Uses locale encoding regardless UTF-8 mode. Currently, open(encoding=None) doesn't work like that. For example, on macOS, Android and VxWorks, it always use UTF-8. And if the UTF-8 Mode is used, UTF-8 is used. In the PEP 597, I read the encoding="locale" is the same than encoding=None but don't emit an EncodingWarning. Where the PEP 597 changes the chosen encoding for encoding=None case? The PEP says "locale encoding" without specifying exactly what it is. In Python, it means different things depending on the context. There is subtle difference the **current** locale encoding and "the locale encoding". I agree that it needs some clarification :-) While we discuss encodings, I never understood why open() gets the current locale encoding from nl_langinfo(CODESET), encoding which can change at runtime while Python is running. For example, if thread A calls open(filename, encoding=None), thread B calls locale.localeconv(), and the LC_MONETARY locale uses a different encoding than the LC_CTYPE locale, thread A can get the LC_MONETARY encoding because of how locale.localeconv() is currently implemented: it changes temporarily LC_CTYPE to LC_MONETARY to decode the monetary fields of localeconv() result. I would prefer that Python uses the same encoding for the whole lifetime of the process, since the beginning until the end. The Python filesystem encoding is a good choice for that. It's the same than locale.getpreferredencoding(False) (currently used by open() and friends), but becomes different if the LC_CTYPE is changed (temporarily or permanently). ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue43552> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com