Re: Need debugging knowhow for my creeping Unicodephobia

mk Thu, 11 Feb 2010 08:47:05 -0800

kj wrote:

I have read a *ton* of stuff on Unicode.  It doesn't even seem all
that hard.  Or so I think.  Then I start writing code, and WHAM:


UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal 
not in range(128)

(There, see?  My Unicodephobia just went up a notch.)

Here's the thing: I don't even know how to *begin* debugging errors
like this.  This is where I could use some help.


>>> a=u'\u0104'
>>>
>>> type(a)
<type 'unicode'>
>>>
>>> nu=a.encode('utf-8')
>>>
>>> type(nu)
<type 'str'>


See what I mean? You encode INTO string, and decode OUT OF string.

To make matters more complicated, str.encode() internally DECODES fromstring into unicode:


>>> nu
'\xc4\x84'
>>>
>>> type(nu)
<type 'str'>
>>> nu.encode()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0:ordinal not in range(128)


There's logic to this, although it makes my brain want to explode. :-)

Regards,
mk


--
http://mail.python.org/mailman/listinfo/python-list

Re: Need debugging knowhow for my creeping Unicodephobia

Reply via email to