akhil1988 wrote:
<mis-ordered reply, bits shown below>>
Nobody-38 wrote:
On Thu, 16 Jul 2009 15:43:37 -0700, akhil1988 wrote:
...
In Python 3 you can't decode strings because they are Unicode strings
and it doesn't make sense to decode a Unicode string. You can only
decode encoded things which are byte strings. So you are mixing up byte
strings and Unicode strings.
... I read a byte string from sys.stdin which needs to converted to unicode
string for further processing.
In 3.x, sys.stdin (stdout, stderr) are text streams, which means that they
read and write Unicode strings, not byte strings.
I cannot just remove the decode statement and proceed?
This is it what it looks like:
for line in sys.stdin:
line = line.decode('utf-8').strip()
if line == '<page>': #do something here
....
If I remove the decode statement, line == '<page>' never gets true.
Did you inadvertently remove the strip() as well?
... unintentionally I removed strip()....
I get this error now:
File "./temp.py", line 488, in <module>
main()
File "./temp.py", line 475, in main
for line in sys.stdin:
File "/usr/local/lib/python3.1/codecs.py", line 300, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2: invalid
data
(1) Do not top post.
(2) Try to fully understand the problem and proposed solution, rather
than trying to get people to tell you just enough to get your code
going.
(3) The only way sys.stdin can possibly return unicode is to do some
decoding of its own. your job is to make sure it uses the correct
decoding. So, if you know your source is always utf-8, try
something like:
import sys
import io
sys.stdin = io.TextIOWrapper(sys.stdin.detach(), encoding='utf8')
for line in sys.stdin:
line = line.strip()
if line == '<page>':
#do something here
....
--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list