Re: UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position 13: ordinal not in range(128)

Scott David Daniels Fri, 17 Jul 2009 10:33:12 -0700

akhil1988 wrote:
<mis-ordered reply, bits shown below>>

Nobody-38 wrote:

On Thu, 16 Jul 2009 15:43:37 -0700, akhil1988 wrote:

...

In Python 3 you can't decode strings because they are Unicode strings
and it doesn't make sense to decode a Unicode string. You can only
decode encoded things which are byte strings. So you are mixing up byte
strings and Unicode strings.

... I read a byte string from sys.stdin which needs to converted to unicode
string for further processing.

In 3.x, sys.stdin (stdout, stderr) are text streams, which means that they
read and write Unicode strings, not byte strings.

I cannot just remove the decode statement and proceed?
This is it what it looks like:
    for line in sys.stdin:
        line = line.decode('utf-8').strip()
        if line == '<page>': #do something here
        ....

If I remove the decode statement, line == '<page>' never gets true.

Did you inadvertently remove the strip() as well?

... unintentionally I removed strip()....
I get this error now:
 File "./temp.py", line 488, in <module>
    main()
  File "./temp.py", line 475, in main
    for line in sys.stdin:
  File "/usr/local/lib/python3.1/codecs.py", line 300, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2: invalid
data


(1) Do not top post.
(2) Try to fully understand the problem and proposed solution, rather
    than trying to get people to tell you just enough to get your code
    going.
(3) The only way sys.stdin can possibly return unicode is to do some
    decoding of its own.  your job is to make sure it uses the correct
    decoding.  So, if you know your source is always utf-8, try
    something like:

    import sys
    import io

    sys.stdin = io.TextIOWrapper(sys.stdin.detach(), encoding='utf8')

    for line in sys.stdin:
        line = line.strip()
        if line == '<page>':
            #do something here
        ....

--Scott David Daniels
[email protected]
--
http://mail.python.org/mailman/listinfo/python-list

Re: UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position 13: ordinal not in range(128)

Reply via email to