kj wrote:
I have read a *ton* of stuff on Unicode. It doesn't even seem all
that hard. Or so I think. Then I start writing code, and WHAM:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal
not in range(128)
(There, see? My Unicodephobia just went up a notch.)
Here's the thing: I don't even know how to *begin* debugging errors
like this. This is where I could use some help.
>>> a=u'\u0104'
>>>
>>> type(a)
<type 'unicode'>
>>>
>>> nu=a.encode('utf-8')
>>>
>>> type(nu)
<type 'str'>
See what I mean? You encode INTO string, and decode OUT OF string.
To make matters more complicated, str.encode() internally DECODES from
string into unicode:
>>> nu
'\xc4\x84'
>>>
>>> type(nu)
<type 'str'>
>>> nu.encode()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0:
ordinal not in range(128)
There's logic to this, although it makes my brain want to explode. :-)
Regards,
mk
--
http://mail.python.org/mailman/listinfo/python-list