Dingyuan Wang added the comment: The new patch should now pass all tests correctly.
The main idea is: * if the token is INDENT, push it on the `indents` stack and continue * if a new line starts, AND the position of the first token >= the length of the last indent level, we assume the indent is contained in the leading whitespaces. The new test_tokenize.py fails: https://bitbucket.org/jaraco/cpython-issue20387/src/b7fe3c865b8dbdb33d26f4bc5cbb6096f5445fb2/Lib/test/test_tokenize.py?at=3.4#cl-1244 Line 1244 should be: codelines = self.roundtrip(code).split(b'\n') It seems that the tokens generated by tokenize.tokenize don't contain enough information to restore the original file. * Tabs between tokens are not preserved. * Space before backslash as line continuation are not preserved. (From test/tokenize_tests.txt) # Backslash means line continuation: -x = 1 \ +x = 1\ + 1 My roundtrip test code copied here from #24447: python2 -c 'import sys, tokenize; sys.stdout.write(tokenize.untokenize(tokenize.generate_tokens(sys.stdin.readline)))' python3 -c 'import sys, tokenize; sys.stdout.buffer.write(tokenize.untokenize(tokenize.tokenize(sys.stdin.buffer.readline)))' ---------- Added file: http://bugs.python.org/file39763/tokenize.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue20387> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com