[issue24214] Exception with utf-8, surrogatepass and incremental decoding

2016-07-27 Thread STINNER Victor
STINNER Victor added the comment: Attached patch fixes the UTF-8 decoder to support correctly incremental decoder using surrogatepass error handler. The bug occurs when b'\xed\xa4\x80' is decoded in two parts: the first two bytes (b'\xed\xa4'), and then the last byte (b'\x80'). It works as ex

[issue24214] Exception with utf-8, surrogatepass and incremental decoding

2016-07-26 Thread RalfM
RalfM added the comment: I just tested Python 3.6.0a3, and that (mis)behaves exactly like 3.4.3. -- versions: +Python 3.6 ___ Python tracker ___ _

[issue24214] Exception with utf-8, surrogatepass and incremental decoding

2015-05-16 Thread RalfM
New submission from RalfM: I have an utf-8 encoded file containing single surrogates. Reading this file, specifying surrgatepass, works fine when I read the whole file with .read(), but raises an UnicodeDecodeError when I read the file line by line: - start of demo - Python 3.4.3 (v3.4