New submission from Pedro Gimeno <pgpyb-4...@personal.formauri.es>:
When implementing an error handler, it must return a tuple consisting of a substitution string and a position where to resume decoding. In the case of the UTF-8 codec, the resume position is ignored, and it always resumes immediately after the character that caused the error. To reproduce, use this code: import codecs codecs.register_error('err', lambda err: (b'x', err.end + 1)) assert repr(u'\uDD00yz'.encode('utf8', errors='err')) == b'xz' The above code fails the assertion because the result is b'xyz'. It works OK for some other codecs. I have not tried to make an exhaustive list of which ones work and which ones don't, therefore this problem might apply to others. ---------- messages: 356610 nosy: pgimeno priority: normal severity: normal status: open title: Resume position for UTF-8 codec error handler not working type: behavior _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue38800> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com