Sean Gillespie added the comment: Went ahead and did it since I had the time - the issue is that when doing a token of lookahead to see whether an 'async' at a top-level begins an 'async def' function or if it is an identifier. A shallow copy of the current token is made and given to another call to tok_get, which frees the token's buffer if a decoding error occurs. Since the shallow copy cloned the token's buffer pointer, the still-live token contains a freed pointer to its buffer that gets freed again later on.
By explicitly nulling-out the token's buffer pointer like tok_get does if the copied token's buffer pointer was nulled out, we avoid the double-free issue and present the correct syntax error: $ ./python vuln.py File "vuln.py", line 1 SyntaxError: Non-UTF-8 code starting with '\xef' in file vuln.py on line 2, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details William Bowling's second program is also fixed with this change, with one additional wrinkle: if a token contains a null byte as the first character, an invalid write occurs when we attempt to replace the null character with a newline. This fix checks to make sure that this is not the case before performing the newline insertion. With this change, both of William Bowling's programs pass valgrind and present the appropriate syntax error. I tried to add this to the couroutine syntax tests, but any way to load the file outside of giving it to ./python itself fails (correctly) because the program contains a null byte. ---------- keywords: +patch Added file: http://bugs.python.org/file41995/tokenizer_double_free.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue26000> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com