Terry J. Reedy <tjre...@udel.edu> added the comment:

Hmm. Python 3 code is unicode. "Python reads program text as Unicode code 
points." The tokenize module purports to provide "a lexical scanner for Python 
source code". But it seems not to do that. Instead it provides a scanner for 
Python code encoded as bytes, which is something different. So this is at least 
a doc update issue (which affects 2.7/3.2 also). Another doc issue is given 
below.

A deeper problem is that tokenize uses the semi-obsolete readline protocol, 
which probably dates to 1.0 and which expects the source to be a file or 
file-like. The more recent iterator protocol would lets the source be anything. 
A modern tokenize function should accept an iterable of  strings. This would 
include but not be limited to a file opened in text mode.

A related problem is that 'tokenize' is a convenience function that does 
several things bundled together.
1. Read lines as bytes from a file-like source.
2. Detect encoding.
3. Decode lines to strings.
4. Actually tokenize the strings to tokens.

I understand this feature request to be a request that function 4, the actual 
Python 3 code tokenizer be unbundled and exposed to users. I agree with this 
request. Any user that starts with actual Py3 code would benefit.

(Compile() is another function that bundles a tokenizer.)

Back to the current doc and another doc problem. The entry for untokenize() 
says "Converts tokens back into Python source code. ...The reconstructed script 
is returned as a single string." That would be nice if true, but I am going to 
guess it is not, as the entry continues "It returns bytes, encoded using the 
ENCODING token,". In Py3, string != bytes, so this seems an incomplete doc 
conversion from Py2.

----------
nosy: +terry.reedy

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue12486>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to