Re: UnicodeEncodeError: 'ascii' codec can't encode character u'\ua000' in position 0: ordinal not in range(128)

Dave Angel Wed, 14 Jan 2015 02:27:03 -0800

On 01/13/2015 10:26 PM, Peng Yu wrote:

Hi,

First, you should always specify your Python version and OS version whenasking questions here. Even if you've been asking questions, many ofus cannot keep track of everyone's specifics, and need to refer to astandard place, the head of the current thread.


I'll assume you're using Python 2.7, on Linux or equivalent.

I am trying to understand what does encode() do. What are the hex
representations of "u" in main.py? Why there is UnicodeEncodeError
when main.py is piped to xxd? Why there is no such error when it is
not piped? Thanks.

~$ cat main.py
#!/usr/bin/env python

u = unichr(40960) + u'abcd' + unichr(1972)
print u

The unicode characters in 'u' must be decoded to a byte stream beforesent to the standard out device. How they're decoded depends on thedevice, and what Python knows (or thinks it knows) about it.

~$ cat main_encode.py
#!/usr/bin/env python

u = unichr(40960) + u'abcd' + unichr(1972)
print u.encode('utf-8')

Here, print is trying to send bytes to a byte-device, and doesn't try tosecond guess anything.

$ ./main.py
ꀀabcd޴
~$ cat main.sh
#!/usr/bin/env bash

set -v
./main.py | xxd
./main_encode.py | xxd

~$ ./main.sh
./main.py | xxd
Traceback (most recent call last):
   File "./main.py", line 4, in <module>
     print u
UnicodeEncodeError: 'ascii' codec can't encode character u'\ua000' in
position 0: ordinal not in range(128)
./main_encode.py | xxd
0000000: ea80 8061 6263 64de b40a                 ...abcd...

I'm guessing (since i already guessed you're running on Linux) that inthe main_encode case, you're printing to a terminal window that Pythonalready knows is utf-8.

But in the pipe case, it cannot tell what's on the other side. So itguesses ASCII, and runs into the conversion problem.

(Everything's different in Python 3.x, though in general the problemstill exists. If the interpreter cannot tell what encoding is needed,it has to guess.)

There are ways to tell Python 2.7 what encoding a given file objectshould have, so you could tell Python to use utf-8 for sys.stdout. Idon't know if that's the best answer, but here's what my notes say:


    import sys, codecs
    sys.stdout = codecs.getwriter('utf8')(sys.stdout)

Once you've done that, print output will go through the specified codecon the way to the redirected pipe.



--
DaveA

--
https://mail.python.org/mailman/listinfo/python-list

Re: UnicodeEncodeError: 'ascii' codec can't encode character u'\ua000' in position 0: ordinal not in range(128)

Reply via email to