kj wrote:
I have read a *ton* of stuff on Unicode.  It doesn't even seem all
that hard.  Or so I think.  Then I start writing code, and WHAM:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal 
not in range(128)

(There, see?  My Unicodephobia just went up a notch.)

Here's the thing: I don't even know how to *begin* debugging errors
like this.  This is where I could use some help.

>>> a=u'\u0104'
>>>
>>> type(a)
<type 'unicode'>
>>>
>>> nu=a.encode('utf-8')
>>>
>>> type(nu)
<type 'str'>


See what I mean? You encode INTO string, and decode OUT OF string.

To make matters more complicated, str.encode() internally DECODES from string into unicode:

>>> nu
'\xc4\x84'
>>>
>>> type(nu)
<type 'str'>
>>> nu.encode()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0: ordinal not in range(128)

There's logic to this, although it makes my brain want to explode. :-)

Regards,
mk


--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to