Serhiy Storchaka added the comment:

Yes, there is a bug. When decoding_fgets() encounter non-UTF-8 bytes, it fails 
and free input buffer in error_ret(). But since tok->cur != tok->inp, next call 
of tok_nextc() reads freed memory.

        if (tok->cur != tok->inp) {
            return Py_CHARMASK(*tok->cur++); /* Fast path */
        }

If Python is not crashed here, new buffer is allocated and assigned to 
tok->buf, then PyTokenizer_Get returns error, parsetok() calculates the 
position of the error

            err_ret->offset = (int)(tok->cur - tok->buf);

but tok->cur points inside old freed buffer, and the offset becomes too large 
integer. err_input() tries to decode the part of the string before error with 
the "replace" error handler, but since the position was wrongly calculated, it 
reads out of allocated memory.

Proposed patch fixes the issue. It sets tok->done and pointers in case of 
decoding error, so they now are in consistent state. It also removes some 
duplicated or dead code.

----------
stage:  -> patch review
Added file: http://bugs.python.org/file40965/issue25388.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue25388>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to