Portable locale usage
Hi, I am musing on how to write portable Python3 code which would take advantage of the standard locale module. For instance, it would be very nice if we could say something like: # does not work! myISOCountryCode='hr' locale.setlocale(locale.LC_ALL, (myISOCountryCode, locale.getpreferredencoding())) Up to now, I have found ways to set locale on Linux and Windows: import locale locale.setlocale(locale.LC_ALL, 'hr_HR.utf8') # works on linux locale.setlocale(locale.LC_ALL, 'hrv_HRV.1250') # works on windows I have noticed that locale defines a dictionary locale.locale_alias, and that it contains the following promising keys: 'hr_hr', 'hrvatski', 'hr'. Unfortunately, both on Windows and Linux all these keys are bound to the same outdated string 'hr_HR.ISO8859-2'. My questions are the following: 1. Is there a way for writing portable Python code dealing with locales (as sketched in the beginning)? 2. If not, is there anything wrong with that idea? 3. What is the status of locale.locale_alias (official documentation does not mention it)? Cheers, Sinisa http://www.zemris.fer.hr/~ssegvic/index_en.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Portable locale usage
On 6 ruj, 13:16, Thomas Jollans wrote: > > locale.setlocale(locale.LC_ALL, (myISOCountryCode, > > locale.getpreferredencoding())) > > As far as I can tell, this does work. Can you show us a traceback? Sorry, I was imprecise. I wanted to say that the above snippet does not work both on Windows and Linux. This is what I get on Windows: >>> import sys >>> sys.version '3.2 (r32:88445, Feb 20 2011, 21:29:02) [MSC v.1500 32 bit (Intel)]' >>> myISOCountryCode='hr' >>> locale.setlocale(locale.LC_ALL, (myISOCountryCode, >>> locale.getpreferredencoding())) Traceback (most recent call last): File "", line 1, in locale.setlocale(locale.LC_ALL, (myISOCountryCode, locale.getpreferredencoding())) File "C:\apps\Python32\lib\locale.py", line 538, in setlocale return _setlocale(category, locale) locale.Error: unsupported locale setting The snippet actually works on Linux, as you note. > It looks like you don't actually care about the encoding: in your first > example, you use the default system encoding, which you do not control, > and in your second example, you're using two different encodings on the > two platforms. That's true. That's because currently I care most about lists of strings being sorted properly (see below). Nevertheless, it *appears* to me that, in the Unicode era, the locales could well be decoupled from particular encodings. But this is another topic. > So why do you care whether or not the default uses ISO 8859-2 ? It's not that I care about encoding, it's that Windows throws locale.Error at me :-) > > My questions are the following: > > > 1. Is there a way for writing portable Python code dealing with > > locales > > (as sketched in the beginning)? > > > 2. If not, is there anything wrong with that idea? > > As I said, I believe the above code should work. It works on my Linux > system. > > What are you attempting to achieve with this setting of the locale, > without even setting the encoding? Doesn't it make more sense to simply > use the user's usual locale, and interact with them on their own terms? For the moment, I only wish to properly sort a Croatian text file both on Windows and Linux (I am a cautious guy, I like reachable goals). When the locale is properly set, sorting works like a charm with mylist.sort(key=locale.strxfrm). My current solution to the portability problem is: import locale try: locale.setlocale(locale.LC_ALL, 'hr_HR.utf8') # linux except locale.Error: locale.setlocale(locale.LC_ALL, 'Croatian_Croatia.1250')# windows Thanks for your feedback! Sinisa -- http://mail.python.org/mailman/listinfo/python-list
Re: Portable locale usage
On 6 ruj, 15:13, Vlastimil Brom wrote: > There may be some differences btween OSes end the versions, but using > python 2.7 and 3.2 on Win XP and Win7 (Czech) > I get the following results for setlocale: > > >>> locale.setlocale(locale.LC_ALL,'Croatian') > > 'Croatian_Croatia.1250'>>> locale.getlocale() > > ('Croatian_Croatia', '1250') > > >>> locale.getpreferredencoding(do_setlocale=False) > 'cp1250' > > However, "hr" is not recognised on this systems: > > >>> locale.setlocale(locale.LC_ALL, "hr") > > Traceback (most recent call last): > File "", line 1, in > File "locale.pyc", line 531, in setlocale > Error: unsupported locale setting Thanks for your feedback! So this works only on Linux (in concordance with the documentation): locale.setlocale(locale.LC_ALL, ('croatian', locale.getpreferredencoding())) And this works only on Windows (incomplete locale spec probably filled in by Windows API): locale.setlocale(locale.LC_ALL, 'croatian') Obviously, there is a misunderstanding between Python which uses standard (IANA) language codes and Windows which, as usual, have their own ways :-( One possible solution would be to change locale.locale_alias on Windows so that it honors the custom Windows conventions: 'hr' -> 'Croatian_Croatia.1250' instead of 'hr' -> 'hr_HR.ISO8859-2' In addition, locale.getpreferredencoding() should probably be changed in order to return valid Windows encodings ('1250' instead of 'cp1250'). Cheers, Sinisa -- http://mail.python.org/mailman/listinfo/python-list
Re: Portable locale usage
On 6 ruj, 22:58, garabik-news-2005...@kassiopeia.juls.savba.sk wrote: > Thomas Jollans wrote: > > It looks like you don't actually care about the encoding: in your first > > example, you use the default system encoding, which you do not control, > > and in your second example, you're using two different encodings on the > > two platforms. So why do you care whether or not the default uses ISO > > 8859-2 ? > > Maybe because using 8859-2 locale, (unicode) strings not representable in the > encodings will be sorted - how? Exactly. Additionally, fonts supporting 8859-2 are scarce. My favourite fonts were never available in 8859-2. Sinisa -- http://mail.python.org/mailman/listinfo/python-list
Re: Portable locale usage
On 6 ruj, 17:53, Thomas Jollans wrote: > On 06/09/11 16:46, ssegvic wrote: > > > For the moment, I only wish to properly sort a Croatian text file > > both on Windows and Linux (I am a cautious guy, I like reachable > > goals). > > When the locale is properly set, sorting works like a charm > > with mylist.sort(key=locale.strxfrm). > > The problem with that is of course that a Croatian locale has to be > installed. Many Linux systems don't have locales that aren't used. It appears we did not understand each other completely. Python locales on Linux work as advertised, I have no problems with locales on Linux whatsoever (yes, the Croatian locale had to be manually installed). On the other hand, it appears that Python locales on Windows do not work as advertised. Consider for instance my initial example: locale.setlocale(locale.LC_ALL, ('hr', locale.getpreferredencoding())) The code above does not work on Windows even though the fine manual says: http://docs.python.org/py3k/library/locale.html ''' locale.setlocale(category, locale=None) ... If (the locale) is a tuple, it is converted to a string using the locale aliasing engine. ... ''' I do not believe my troubles could be solved by installing anything, since the OS support for Croatian apperas to be present: locale.setlocale(locale.LC_ALL, 'Croatian_Croatia.1250') To conclude, it seems to me that the Windows implementation of the locale aliasing engine has some space for improvement. All further comments shall be greatly appreciated :-) Cheers, Sinisa -- http://mail.python.org/mailman/listinfo/python-list
Re: Portable locale usage
On 6 ruj, 17:53, Thomas Jollans wrote: > On 06/09/11 16:46, ssegvic wrote: > > > For the moment, I only wish to properly sort a Croatian text file > > both on Windows and Linux (I am a cautious guy, I like reachable > > goals). > > When the locale is properly set, sorting works like a charm > > with mylist.sort(key=locale.strxfrm). > > The problem with that is of course that a Croatian locale has to be > installed. Many Linux systems don't have locales that aren't used. I already concluded that on Linux there are no problems whatsoever (the Croatian locale was kindly installed by the distribution setup). Since my initial snippet does not work on Windows, I would conclude that the locale aliasing engine on Windows should be improved. Any opposing views will be appreciated :-) For convenience, I repeat the snippet here: import locale locale.setlocale(locale.LC_ALL, ('hr', locale.getpreferredencoding())) Cheers, Sinisa -- http://mail.python.org/mailman/listinfo/python-list