Serhiy Storchaka added the comment: Yes, there is a bug. When decoding_fgets() encounter non-UTF-8 bytes, it fails and free input buffer in error_ret(). But since tok->cur != tok->inp, next call of tok_nextc() reads freed memory.
if (tok->cur != tok->inp) { return Py_CHARMASK(*tok->cur++); /* Fast path */ } If Python is not crashed here, new buffer is allocated and assigned to tok->buf, then PyTokenizer_Get returns error, parsetok() calculates the position of the error err_ret->offset = (int)(tok->cur - tok->buf); but tok->cur points inside old freed buffer, and the offset becomes too large integer. err_input() tries to decode the part of the string before error with the "replace" error handler, but since the position was wrongly calculated, it reads out of allocated memory. Proposed patch fixes the issue. It sets tok->done and pointers in case of decoding error, so they now are in consistent state. It also removes some duplicated or dead code. ---------- stage: -> patch review Added file: http://bugs.python.org/file40965/issue25388.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue25388> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com