STINNER Victor <vstin...@python.org> added the comment:
Attached encodings.py lists the different "locale encodings" used by Python. Example: --- $ LANG=fr_FR ./python -X utf8 encodings.py fr_FR@euro Set LC_CTYPE to 'fr_FR@euro' LC_ALL env var: '' LC_CTYPE env var: '' LANG env var: 'fr_FR' LC_CTYPE locale: 'fr_FR@euro' Coerce C locale: 0 Python UTF-8 Mode: 1 (1) Python FS encoding sys.getfilesystemencoding(): 'utf-8' (2) Python locale encoding _locale._get_locale_encoding(): 'UTF-8' locale.getpreferredencoding(False): 'UTF-8' (3) Current locale encoding locale.get_current_locale_encoding(): 'ISO-8859-15' (4) And more encodings for more fun! locale.getdefaultlocale()[1]: 'ISO8859-1' locale.getpreferredencoding(True): 'UTF-8' --- Python starts with LC_CTYPE locale set to fr_FR (ISO8859-1), then the script sets the LC_CTYPE locale to fr_FR@euro (ISO-8859-15). The Python UTF-8 Mode is enabled explicitly. We get a funny combination of not less than 3 encodings! * UTF-8 * ISO-8859-1 * ISO-8859-15 Which one is the correct one? Wel... It depends :-) (1) The Python filesystem encoding is used to call almost all operating system functions: encode to the OS and decode from the OS. Filenames, environment variables, command line options, etc. (2) The "Python" locale encoding is used by open() when no encoding is specific. (3) The current locale encoding is used for a limited amount of functions that I listed in msg389063. Most users should not use it. (4) locale.getpreferredencoding(True) is a weird beast. It is Python locale encoding until setlocale(LC_CTYPE, locale) is called for the first time. But it can be same if the Python UTF-8 Mode is enabled. I'm not sure in which category we should put this function :-( (4 bis) locale.getdefaultlocale()[1] is the only function returning the ISO-8859-1 encoding. This encoding is not used by any function. I'm not sure of the purpose of this function. It sounds confusing. I suggest to deprecate locale.getpreferredencoding(True). I'm not sure what to do with locale.getdefaultlocale(). Should we deprecate it? I never used this function. How is it used? For which purpose? I undertand that in 2000, locale.getdefaultlocale() was interesting to avoid calling setlocale(LC_CTYPE, ""). But Python 3 calls setlocale(LC_CTYPE, "") by default at startup since the early versions, and it's now called on all platforms since Python 3.8. Moreover, its internal database seems to be outdated and is painful to maintain (especially if we consider all platforms supported by Python, not only Linux, there are many issues on macOS). ---------- Added file: https://bugs.python.org/file49894/encodings.py _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue43552> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com