STINNER Victor <vstin...@python.org> added the comment:

I created this issue while reviewing the implementation of the PEP 597: PR 
19481.

Copy of my comments on the PR related to this issue.


_locale.get_locale_encoding() calls _Py_GetLocaleEncoding() which returns UTF-8 
if the Python UTF-8 Mode is enabled.

Maybe the function could have a flag: please don't lie to me and return the 
current locale encoding ;-)

Or we could add a function to get the *current* locale encoding: 
**locale.get_current_locale_encoding()**.

This one would ignore the UTF-8 Mode and call nl_langinfo(CODESET). There are 
APIs to use the *current* locale encoding: 
PyUnicode_EncodeLocale/PyUnicode_DecodeLocale and 
_Py_EncodeLocaleEx/_Py_DecodeLocaleEx with current_locale=1. You can see which 
functions use it:

* decode tm_zone field of localtime_r() and gmtime()
* decode tzname[0] and tzname[1] strings
* decode setlocale() result
* decode some localeconv() fields (this function requires to switch to 
different locale encoding, it's bad!)
* decode nl_langinfo() result
* decode gettext(), dgettext(), dcgettext(), textdomain(), bindtextdomain(), 
bind_textdomain_codeset() result
* decode strerror() and dlerror() result
* encode/decode in the readline module
* encode format string for strftime() in time.strftime() (only used on Windows, 
Unix provides wcsftime) and then decode strftime() result


> encoding="locale" : Uses locale encoding regardless UTF-8 mode.

Currently, open(encoding=None) doesn't work like that. For example, on macOS, 
Android and VxWorks, it always use UTF-8. And if the UTF-8 Mode is used, UTF-8 
is used.

In the PEP 597, I read the encoding="locale" is the same than encoding=None but 
don't emit an EncodingWarning. Where the PEP 597 changes the chosen encoding 
for encoding=None case? The PEP says "locale encoding" without specifying 
exactly what it is. In Python, it means different things depending on the 
context. There is subtle difference the **current** locale encoding and "the 
locale encoding". I agree that it needs some clarification :-)

While we discuss encodings, I never understood why open() gets the current 
locale encoding from nl_langinfo(CODESET), encoding which can change at runtime 
while Python is running. For example, if thread A calls open(filename, 
encoding=None), thread B calls locale.localeconv(), and the LC_MONETARY locale 
uses a different encoding than the LC_CTYPE locale, thread A can get the 
LC_MONETARY encoding because of how locale.localeconv() is currently 
implemented: it changes temporarily LC_CTYPE to LC_MONETARY to decode the 
monetary fields of localeconv() result.

I would prefer that Python uses the same encoding for the whole lifetime of the 
process, since the beginning until the end. The Python filesystem encoding is a 
good choice for that. It's the same than locale.getpreferredencoding(False) 
(currently used by open() and friends), but becomes different if the LC_CTYPE 
is changed (temporarily or permanently).

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue43552>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to