[issue20087] Mismatch between glibc and X11 locale.alias

Marc-Andre Lemburg Tue, 07 Mar 2017 10:29:20 -0800

Marc-Andre Lemburg added the comment:

On 07.03.2017 18:23, Serhiy Storchaka wrote:
> 
> Serhiy Storchaka added the comment:
> 
>> 'cy_GB.ISO8859-1' to 'cy_GB.ISO8859-14'
> 
> Looks as just fixing an error. The default West-European ISO8859-1 is changed 
> to Celtic cy_GB.ISO8859-14. This looks better option for Welsh.
> 
>> 'tg_TJ.KOI8-C' to 'tg_TJ.KOI8-T'
> 
> KOI8-C is not supported by Python, but KOI8-T is supported. I don't know what 
> KOI8-C means, there are several rarely used incompatible encodings with this 
> name.


While all this may make sense, I'm missing some more reasoning
behind the differences between X.org and glibc.

This change also looks strange:

-    'ka_ge':                                'ka_GE.GEORGIAN-ACADEMY',
+    'ka_ge':                                'ka_GE.GEORGIAN_PS',
     'ka_ge.georgianacademy':                'ka_GE.GEORGIAN-ACADEMY',
     'ka_ge.georgianps':                     'ka_GE.GEORGIAN-PS',
     'ka_ge.georgianrs':                     'ka_GE.GEORGIAN-ACADEMY',

Why is GEORGIAN_PS written with an underscore whereas the other
mappings use dashes ?

Or this one:

-    'fi_fi':                                'fi_FI.ISO8859-15',
+    'fi_fi':                                'fi_FI.ISO8859-1',

Why would a locale switch away from an encoding having
the Euro sign to one without it ?

Or why is this latin variant removed:

-    'nan_tw@latin':                         'nan_TW.UTF-8@latin',

Why should Russians switch back to ISO ?

-    'ru_ru':                                'ru_RU.UTF-8',
+    'ru_ru':                                'ru_RU.ISO8859-5',

or from ISO to KOI ?

-    'russian':                              'ru_RU.ISO8859-5',
+    'russian':                              'ru_RU.KOI8-R',

The more I look at these changes, the more I believe we
should not simply take everything we find in the files
for granted. They obviously both have bugs.

>> I also don't understand why some "xx.utf-8" locale mappings were removed - I 
>> don't think we should remove those, unless they are no longer needed due to 
>> some other logic implying these mappings.
> 
> The aliases table is a table of exceptions. Removed entries no longer are 
> exceptional.

It's not a table of exceptions, it's a table mapping commonly
used locale settings to ones which the lib C understands :-)

But regardless, I checked the code and it is already
smart enough to convert lib C incompatible spellings such
as "utf8" to "UTF-8", so these entries can indeed be
removed, but only if the locale is otherwise listed.

In some cases, it's probably better to drop the ".utf8"
to have more generic mappings, e.g.

+    'bhb_in.utf8':                          'bhb_IN.UTF-8',

or

     'de_li.utf8':                           'de_LI.UTF-8',

though I'd expect that mapping to be:

     'de_li':                           'de_LI.ISO8859-1',

as for all other "de" entries.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue20087>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue20087] Mismatch between glibc and X11 locale.alias

Reply via email to