[issue20387] tokenize/untokenize roundtrip fails with tabs

Dingyuan Wang Sat, 20 Jun 2015 00:14:02 -0700

Dingyuan Wang added the comment:

Sorry for the inconvenience. I failed to find this old bug.


I think there is another problem. The docs of `untokenize` said "The iterable 
must return sequences with **at least** two elements, the token type and the 
token string. Any additional sequence elements are ignored.", so if I feed in, 
say, a 3-tuple, the untokenize should accept it as tok[:2].

The attached patch should have addressed the problems above. 

When trying to make a patch, a tokenize bug was found. Consider the new 
attached tab.py, the tabs between comments and code, and the tabs between 
expressions are lost, so when untokenizing, position information is used to 
produce equivalent spaces, instead of tabs.

Despite the tokenization problem, the patch can produce syntactically correct 
code as accurately as it can.

The PEP 8 recommends spaces for indentation, but the usage of tabs should not 
be ignored.

new tab.py (in Python string):

'#!/usr/bin/env python\n# -*- coding: utf-8 -*-\n\ndef foo():\n\t"""\n\tTests 
tabs in tokenization\n\t\tfoo\n\t"""\n\tpass\n\tpass\n\tif 1:\n\t\t# not indent 
correctly\n\t\tpass\n\t\t# correct\ttab\n\t\tpass\n\tpass\n\tbaaz = 
{\'a\ttab\':\t1,\n\t\t\t\'b\': 2}\t\t# also fails\n\npass\n#if 
2:\n\t#pass\n#pass\n'

----------
keywords: +patch
nosy: +gumblex
Added file: http://bugs.python.org/file39748/tokenize.patch

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue20387>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue20387] tokenize/untokenize roundtrip fails with tabs

Reply via email to