On 19.03.2021 10:17, STINNER Victor wrote: > > New submission from STINNER Victor <vstin...@python.org>: > > I propose to add two new functions: > > * locale.get_locale_encoding(): it's exactly the same than > locale.getpreferredencoding(False). > > * locale.get_current_locale_encoding(): always get the current locale > encoding. Read the ANSI code page on Windows, or nl_langinfo(CODESET) on > other platforms. Ignore the UTF-8 Mode. Don't always return "UTF-8" on macOS, > Android, VxWorks.
I'm not sure whether this would improve the situation much. The problem is that the locale module is meant to expose the lib C locale settings, but many of the recent additions actually do something completely different: they look into the process and user environment and try to determine external settings, which are not reflected in the lib C locale settings. I had added locale.getdefaultlocale() to give applications a chance to determine the locale setting defined by the process environment *without* calling setlocale(LC_ALL, '') and causing problems in other threads. I used the X11 database for locale encodings, which was the closest you could get to in terms of a standard for encodings at the time (around 2000). Part of the return value is the encoding, which would be set. Martin later added locale.getpreferredencoding(), which tries to determine the encoding in a different way new way, based on nl_langset(CODEINFO). As you mentioned, this intention was broken on several platforms by forcing UTF-8 as output. And in many cases, the API had to call setlocale() as well, causing the thread problems. However, the problem with nl_langset(CODEINFO) is the same as with setlocale(): it returns the current state of the lib C settings, which may well point to the 'C' locale. Not the ones the user has configured in the OS environment. So while you get an encoding defined by lib C for the current locale settings (without guessing it as with locale.getdefaultlocale()), you still don't get what the user really wants to use. Unfortunately, lib C does not provide a way to query the locale database without changing the locale settings at the same time. This is the main issue we're facing. Now, the correct way in all this would be to just call setlocale(LC_ALL, '') at the start of the application and not try to apply all the magic to get around this. But this has to be done by the application and not Python (which may well be embedded into some other application). I'd suggest to add a single new API: locale.getencoding() which interfaces to nl_langinfo(CODESET) or the Windows code page and does not try to do any magic, ie. does *not* call setlocale(). It needs to return what the lib C currently knows and uses as encoding. locale.getpreferredencoding() should then be deprecated. It does not make sense to pretend to query information which is not really directly available from the lib C locale system. And the documentation should point out that applications should call setlocale(LC_ALL, '') when they start up, if they want to get the lib C locale, and thus Python locale module, setup to work based on what the user really wants -- instead of just guessing at this. PS: The locale module normally does not use underscores in function names, so it's not a good idea to add more. -- Marc-Andre Lemburg eGenix.com _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com