Marc-Andre Lemburg added the comment: On 10.03.2017 08:37, Benjamin Peterson wrote: > > Do you believe this program should work? > > import locale, os > for l in open("/usr/share/i18n/SUPPORTED"): > alias, encoding = l.strip().split() > locale.setlocale(locale.LC_ALL, alias) > try: > enc = locale.getlocale()[1] > except ValueError: > continue # not in table > normalized = enc.replace("ISO", "ISO-"). \ > replace("_", "-"). \ > replace("euc", "EUC-"). \ > replace("big5", "big5-").upper() > assert normalized == locale.nl_langinfo(locale.CODESET) > > After my change it does—the encoding returned from getlocale() is the one > actually being used by glibc. It fails dramatically on earlier versions of > Python (for example on the en_IN example from #29571.) I don't understand why > Python needs to editorialize whatever choices libc or the system > administrator has made.
Your program essentially tests what alias is configured on your particular system. It will fail on older systems (with a different or no version of SUPPORTED), it will fail on systems that do not have all locales installed, it will fail on systems that use the X.org aliases table as basis rather than some list of supported locales of glibc, or custom alias tables. What we want in Python is a consistent mapping of aliases to locales across all (Unix based) Python installations, just like what we have for encoding aliases and those mappings should be taken from a support alias database, not a list of default installations on some glibc version. Also note that a lot of these discussions are really academic, since locales should always be specified with encoding. While Unix gravitates to UTF-8 for all system related things, users still use other encodings a lot for their daily operations, as you can see in the X.org aliases file. This is why defaulting to UTF-8 for locales (as e.g. is done for many locales in the glibc default installs) is not a good idea. Locales affect user work products. What's fine for command line interfacing or piping, is not necessarily for fine for e.g. documents created by users. So to answer your question: No, I don't believe that SUPPORTED has any authority for our purposes and thus don't think that the program can be considered a valid test case. The SUPPORTED file can server as extra resource for fixing bugs in the table, but nothing more. > Is getlocale() expected to return something different from the underlying C > locale? getlocale() will return whatever is currently configured via setlocale(). Of course, it can return something different from what some glibc SUPPORTED lists as default installation encoding, if you don't provide the encoding when using setlocale(), but it will always default to the same locale and encoding on all platforms where you run Python. > In fact, why have this table at all instead of using nl_langinfo to return > the encoding for the current locale? The table is meant to normalize locale names and enrich them with default encodings from a well known database of such aliases, where necessary. As mentioned above the locale setting should ideally include the encoding as well, so that any such guesses are not necessary. Regarding nl_langinfo(): nl_langinfo() will only work if you have called setlocale() already, since a process always starts up in the C locale without this call. If you don't have a problem with calling setlocale() for testing the default locale settings (e.g. Python is not embedded, you don't have other threads running, no APIs which use locale information called yet, setlocale() was already called to setup the locale, etc.), you can use the approach taken by getpreferredencoding(), which is to temporarily set the locale to the default. Going forward, I think that the following changes make sense: * from ISO8859-1 to ISO8859-15 (the -15 version adds the Euro sign) * casing changes e.g. 'zh_CN.gb2312' to 'zh_CN.GB2312' * fixes which undo removal of modifiers such as 'uz_uz@cyrillic' -> 'uz_UZ.UTF-8' to 'uz_UZ.UTF-8@cyrillic' As for the other changes: please undo them and also revert the unconditional use of glibc mappings overriding the X.org ones, as mentioned earlier in the thread. We can readd some of the modifications later on if there's evidence that they actually do make sense. Thanks, -- Marc-Andre Lemburg eGenix.com ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue20087> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com