[issue15278] UnicodeDecodeError when readline in codecs.py

2013-01-07 Thread Serhiy Storchaka
Changes by Serhiy Storchaka : -- resolution: -> duplicate stage: patch review -> committed/rejected status: open -> closed superseder: -> UTF-16 incremental decoder doesn't support partial surrogate pair ___ Python tracker

[issue15278] UnicodeDecodeError when readline in codecs.py

2012-10-24 Thread Serhiy Storchaka
Changes by Serhiy Storchaka : -- keywords: +needs review stage: -> patch review ___ Python tracker ___ ___ Python-bugs-list mailing l

[issue15278] UnicodeDecodeError when readline in codecs.py

2012-10-10 Thread Walter Dörwald
Walter Dörwald added the comment: > >>> codecs.utf_8_decode('\u20ac'.encode('utf8')[:2]) > ('', 0) > > Oh... codecs.CODEC_decode are incremental decoders? I misunderstood completly > this. No, those function are not decoders, they're just helper functions used to implement the real incremental

[issue15278] UnicodeDecodeError when readline in codecs.py

2012-10-09 Thread STINNER Victor
STINNER Victor added the comment: > I don't understand you. Read my last message, I was wrong. -- ___ Python tracker ___ ___ Python-b

[issue15278] UnicodeDecodeError when readline in codecs.py

2012-10-09 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: > Hum no. The bug is an issue in the design of codecs.Stream* classes: > incremental decoders and encoders should be used instead of classic > decoders/encoders. I don't understand you. StreamReader and IncrementalDecoder both use the same decoder. class I

[issue15278] UnicodeDecodeError when readline in codecs.py

2012-10-09 Thread STINNER Victor
STINNER Victor added the comment: >>> codecs.utf_8_decode('\u20ac'.encode('utf8')[:2]) ('', 0) Oh... codecs.CODEC_decode are incremental decoders? I misunderstood completly this. "The bug is an issue in the design of codecs.Stream* classes: incremental decoders and encoders should be used ins

[issue15278] UnicodeDecodeError when readline in codecs.py

2012-10-09 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: > This issue may be related or a duplicate of #11461. Oh, yes, it is a duplicate. I totally forgot about it and made the work again. > Only incremental decoder should return partial results. Other decoders are > strict and (usually) stateless. Yes, there is

[issue15278] UnicodeDecodeError when readline in codecs.py

2012-10-09 Thread STINNER Victor
STINNER Victor added the comment: > This issue may be related or a duplicate of #11461. Hum no. The bug is an issue in the design of codecs.Stream* classes: incremental decoders and encoders should be used instead of classic decoders/encoders. I don't want to fix this issue: it's better to mo

[issue15278] UnicodeDecodeError when readline in codecs.py

2012-10-09 Thread STINNER Victor
STINNER Victor added the comment: > with codecs.open('test.txt', 'wb', 'utf-16-le') as fp: Since Python 2.6+, you can use io.open() which uses the new io library. The io library uses TextIOWrapper which uses incremental encoder and decoder and so handles multibyte encodings correctly (as UTF-1

[issue15278] UnicodeDecodeError when readline in codecs.py

2012-10-09 Thread STINNER Victor
STINNER Victor added the comment: This issue may be related or a duplicate of #11461. > For example codecs.utf_16_le_decode(b'\x00\xd8\x00') should return ('', 0), > but raises UnicodeDecodeError. Only incremental decoder should return partial results. Other decoders are strict and (usually)

[issue15278] UnicodeDecodeError when readline in codecs.py

2012-10-08 Thread Antoine Pitrou
Changes by Antoine Pitrou : -- nosy: +haypo ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.

[issue15278] UnicodeDecodeError when readline in codecs.py

2012-10-08 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Here are the patches. -- keywords: +patch Added file: http://bugs.python.org/file27495/utf16_partial_decode-3.3.patch Added file: http://bugs.python.org/file27496/utf16_partial_decode-3.2.patch Added file: http://bugs.python.org/file27497/utf16_partial

[issue15278] UnicodeDecodeError when readline in codecs.py

2012-10-08 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: This error happens due to the fact that utf16* decoders do not properly partial decode truncated data. Exception raised if input data truncated on the second surrogate in the surrogate pair. For example codecs.utf_16_le_decode(b'\x00\xd8\x00') should return

[issue15278] UnicodeDecodeError when readline in codecs.py

2012-10-08 Thread Marcus Gröber
Marcus Gröber added the comment: I came across this today as well. A short way of summarizing this error seems to be: Reading a file using readline (or "for line in file") fails, if the following two conditions are true: • A codec (e.g. UTF-8) for a multi-byte encoding is used, and •

[issue15278] UnicodeDecodeError when readline in codecs.py

2012-07-07 Thread lovelylain
New submission from lovelylain : This is an example, `for line in fp` will raise UnicodeDecodeError: #! -*- coding: utf-8 -*- import codecs text = u'\u6731' + u'\U0002a6a5' * 18 print repr(text) with codecs.open('test.txt', 'wb', 'utf-16-le') as fp: fp.write(text) with codecs.open('test.tx