On Fri, Nov 21, 2014 at 1:14 AM, Francis Moreau <francis.m...@gmail.com> wrote: > Hi, > > Thanks for the "from __future__ import unicode_literals" trick, it makes > that switch much less intrusive. > > However it seems that I will suddenly be trapped by all modules which > are not prepared to handle unicode. For example: > > >>> from __future__ import unicode_literals > >>> import locale > >>> locale.setlocale(locale.LC_ALL, 'fr_FR') > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/usr/lib64/python2.7/locale.py", line 546, in setlocale > locale = normalize(_build_localename(locale)) > File "/usr/lib64/python2.7/locale.py", line 453, in _build_localename > language, encoding = localetuple > ValueError: too many values to unpack > > Is the locale module an exception and in that case I'll fix it by doing: > > >>> locale.setlocale(locale.LC_ALL, b'fr_FR') > > or is a (big) part of the modules in python 2.7 still not ready for > unicode and in that case I have to decide which prefix (u or b) I should > manually add ?
Sadly, there are quite a lot of parts of Python 2 that simply don't handle Unicode strings. But you can probably keep all of those down to just a handful of explicit b"whatever" strings; most places should accept unicode as well as str. What you're seeing here is a prime example of one of this author's points (caution, long post): http://unspecified.wordpress.com/2012/04/19/the-importance-of-language-level-abstract-unicode-strings/ """The lesson of Python 3 is: give programmers a Unicode string type, *make it the default*, and encoding issues will /mostly/ go away.""" There's a whole ecosystem to Python 2 - some in the standard library, heaps more in the rest of the world - and a lot of it was written on the assumption that a byte is a character is an octet. When you pass Unicode strings to functions written to expect byte strings, sometimes you win, and sometimes you lose... even with the standard library itself. But the Python 3 ecosystem has been written on the assumption that strings are Unicode. It's only a narrow set of programs ("boundary code", where you're moving text across networks and stuff like that) where the Python 2 model is easier to work with; and the recent Py3 releases have been progressively working to relieve that pain. The absolute worst case is a function which exists in Python 2 and 3, and requires a byte string in Py2 and a text string in Py3. Sadly, that may be exactly what locale.setlocale() is. For that, I would suggest explicitly passing stuff through str(): locale.setlocale(locale.LC_ALL, str('fr_FR')) In Python 3, 'fr_FR' is already a str, so passing it through str() will have no significant effect. (Though it would be worth commenting that, to make it clear to a subsequent reader that this is Py2 compat code.) In Python 2 with unicode_literals active, 'fr_FR' is a unicode, so passing it through str() will encode it to ASCII, producing a byte string that setlocale should be happy with. By the way, the reason for the strange error message is clearer in Python 3, which chains in another exception: >>> locale.setlocale(locale.LC_ALL, b'fr_FR') Traceback (most recent call last): File "/usr/local/lib/python3.5/locale.py", line 498, in _build_localename language, encoding = localetuple ValueError: too many values to unpack (expected 2) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.5/locale.py", line 594, in setlocale locale = normalize(_build_localename(locale)) File "/usr/local/lib/python3.5/locale.py", line 507, in _build_localename raise TypeError('Locale must be None, a string, or an iterable of two strings -- language code, encoding.') TypeError: Locale must be None, a string, or an iterable of two strings -- language code, encoding. So when it gets the wrong type of string, it attempts to unpack it as an iterable; it yields five values (the five bytes or characters, depending on which way it's the wrong type of string), but it's expecting two. Fortunately, str() will deal with this. But make sure you don't have the b prefix, or str() in Py3 will give you quite a different result! ChrisA -- https://mail.python.org/mailman/listinfo/python-list