[issue38800] Resume position for UTF-8 codec error handler not working

2019-11-15 Thread Serhiy Storchaka
Change by Serhiy Storchaka : -- resolution: -> out of date stage: -> resolved status: open -> closed ___ Python tracker ___ ___ Py

[issue38800] Resume position for UTF-8 codec error handler not working

2019-11-14 Thread Pedro Gimeno
Pedro Gimeno added the comment: Python 3.5 from Debian stretch (oldstable). You're right, I can't reproduce it in 3.7 from Buster. Sorry for the bogus report. -- ___ Python tracker _

[issue38800] Resume position for UTF-8 codec error handler not working

2019-11-14 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: It works to me (after fixing the assertion). What Python version do you use? In Python 2 u'\uDD00' is encodable to UTF-8, so the error handler is not called. u'\uDD00yz'.encode('utf8') gives '\xed\xb4\x80yz'. -- nosy: +serhiy.storchaka __

[issue38800] Resume position for UTF-8 codec error handler not working

2019-11-14 Thread Pedro Gimeno
Pedro Gimeno added the comment: I forgot the quotes in the assertion, it should have been "b'xz'". -- ___ Python tracker ___ ___ Py

[issue38800] Resume position for UTF-8 codec error handler not working

2019-11-14 Thread Pedro Gimeno
New submission from Pedro Gimeno : When implementing an error handler, it must return a tuple consisting of a substitution string and a position where to resume decoding. In the case of the UTF-8 codec, the resume position is ignored, and it always resumes immediately after the character that