STINNER Victor <vstin...@python.org> added the comment:

Attached encodings.py lists the different "locale encodings" used by Python. 
Example:
---
$ LANG=fr_FR ./python -X utf8 encodings.py fr_FR@euro
Set LC_CTYPE to 'fr_FR@euro'

LC_ALL env var: ''
LC_CTYPE env var: ''
LANG env var: 'fr_FR'
LC_CTYPE locale: 'fr_FR@euro'
Coerce C locale: 0
Python UTF-8 Mode: 1

(1) Python FS encoding
sys.getfilesystemencoding(): 'utf-8'

(2) Python locale encoding
_locale._get_locale_encoding(): 'UTF-8'
locale.getpreferredencoding(False): 'UTF-8'

(3) Current locale encoding
locale.get_current_locale_encoding(): 'ISO-8859-15'

(4) And more encodings for more fun!
locale.getdefaultlocale()[1]: 'ISO8859-1'
locale.getpreferredencoding(True): 'UTF-8'
---

Python starts with LC_CTYPE locale set to fr_FR (ISO8859-1), then the script 
sets the LC_CTYPE locale to fr_FR@euro (ISO-8859-15). The Python UTF-8 Mode is 
enabled explicitly. We get a funny combination of not less than 3 encodings!

* UTF-8
* ISO-8859-1
* ISO-8859-15

Which one is the correct one? Wel... It depends :-)

(1) The Python filesystem encoding is used to call almost all operating system 
functions: encode to the OS and decode from the OS. Filenames, environment 
variables, command line options, etc.

(2) The "Python" locale encoding is used by open() when no encoding is specific.

(3) The current locale encoding is used for a limited amount of functions that 
I listed in msg389063. Most users should not use it.

(4) locale.getpreferredencoding(True) is a weird beast. It is Python locale 
encoding until setlocale(LC_CTYPE, locale) is called for the first time. But it 
can be same if the Python UTF-8 Mode is enabled. I'm not sure in which category 
we should put this function :-(

(4 bis) locale.getdefaultlocale()[1] is the only function returning the 
ISO-8859-1 encoding. This encoding is not used by any function. I'm not sure of 
the purpose of this function. It sounds confusing.


I suggest to deprecate locale.getpreferredencoding(True).

I'm not sure what to do with locale.getdefaultlocale(). Should we deprecate it? 
I never used this function. How is it used? For which purpose?

I undertand that in 2000, locale.getdefaultlocale() was interesting to avoid 
calling setlocale(LC_CTYPE, ""). But Python 3 calls setlocale(LC_CTYPE, "") by 
default at startup since the early versions, and it's now called on all 
platforms since Python 3.8. Moreover, its internal database seems to be 
outdated and is painful to maintain (especially if we consider all platforms 
supported by Python, not only Linux, there are many issues on macOS).

----------
Added file: https://bugs.python.org/file49894/encodings.py

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue43552>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to