New submission from Paul Sokolovsky <pfal...@users.sourceforge.net>:

Currently, it's possible:

* To get from stream-of-characters program representation to AST representation 
(AST.parse()).
* To get from AST to code object (compile()).
* To get from a code object to first-class function to the execute the program.

Python also offers "tokenize" module, but it stands as a disconnected island: 
the only things it allows to do is to get from stream-of-characters program 
representation to stream-of-tokens, and back. At the same time, conceptually, 
tokenization is not a disconnected feature, it's the first stage of language 
processing pipeline. The fact that "tokenize" is disconnected from the rest of 
the pipeline, as listed above, is more an artifact of CPython implementation: 
both "ast" module and compile() module are backed by the underlying bytecode 
compiler implementation written in C, and that's what connects them.

On the other hand, "tokenize" module is pure-Python, while the underlying 
compiler has its own tokenizer implementation (not exposed). That's the likely 
reason of such disconnection between "tokenize" and the rest of the 
infrastructure.

I propose to close that gap, and establish an API which would allow to parse 
token stream (iterable) into an AST. An initial implementation for CPython can 
(and likely should) be naive, making a loop thru surface program 
representation. That's ok, again, the idea is to establish a standard API to be 
able to go tokens -> AST, then individual Python implementation can 
make/optimize it based on their needs.

The proposed name is ast.parse_tokens(). It follows the signature of the 
existing ast.parse(), except that first parameter is "token_stream" instead of 
"source".

Another alternative would be to overload existing ast.parse() to accept token 
iterable. I guess, at the current stage, where we try to tighten up type 
strictness of API, and have clear typing signatures for API functions, this is 
not favored solution.

----------
components: Library (Lib)
messages: 383680
nosy: BTaskaya, pablogsal, pfalcon, serhiy.storchaka
priority: normal
severity: normal
status: open
title: tokenize, ast: No direct way to parse tokens into AST, a gap in the 
language processing pipiline
versions: Python 3.10

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue42729>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to