[issue20387] tokenize/untokenize roundtrip fails with tabs

Dingyuan Wang Sun, 21 Jun 2015 19:33:57 -0700

Dingyuan Wang added the comment:

The new patch should now pass all tests correctly.


The main idea is:
* if the token is INDENT, push it on the `indents` stack and continue
* if a new line starts, AND the position of the first token >= the length of 
the last indent level, we assume the indent is contained in the leading 
whitespaces.

The new test_tokenize.py fails:

https://bitbucket.org/jaraco/cpython-issue20387/src/b7fe3c865b8dbdb33d26f4bc5cbb6096f5445fb2/Lib/test/test_tokenize.py?at=3.4#cl-1244

Line 1244 should be:
codelines = self.roundtrip(code).split(b'\n')

It seems that the tokens generated by tokenize.tokenize don't contain enough 
information to restore the original file.

* Tabs between tokens are not preserved.
* Space before backslash as line continuation are not preserved.

(From test/tokenize_tests.txt)
 # Backslash means line continuation:
-x = 1 \
+x = 1\
 + 1

My roundtrip test code copied here from #24447:

python2 -c 'import sys, tokenize; 
sys.stdout.write(tokenize.untokenize(tokenize.generate_tokens(sys.stdin.readline)))'
python3 -c 'import sys, tokenize; 
sys.stdout.buffer.write(tokenize.untokenize(tokenize.tokenize(sys.stdin.buffer.readline)))'

----------
Added file: http://bugs.python.org/file39763/tokenize.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue20387>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue20387] tokenize/untokenize roundtrip fails with tabs

Reply via email to