New submission from Łukasz Langa <luk...@langa.pl>: lib2to3's token.py and tokenize.py were initially copies of the respective files from the standard library. They were copied to allow Python 3 to read Python 2's grammar.
Since 2006, lib2to3 grew to be widely used as a Concrete Syntax Tree, also for parsing Python 3 code. Additions to support Python 3 grammar were added but sadly, the main token.py and tokenize.py diverged. This change brings them back together, minimizing the differences to the bare minimum that is in fact required by lib2to3. Before this change, almost every line in lib2to3/pgen2/tokenize.py was different from tokenize.py. After this change, the diff between the two files is only 175 lines long and is entirely filled with relevant Python 2 compatibility bits. Merging the implementations, there's numerous fixes to the lib2to3 tokenizer: + docstrings made as similar as possible + ported `TokenInfo` + ported `tokenize.tokenize()` and `tokenize.open()` + removed Python 2-only implementation cruft + fixes Unicode identifier handling + fixes string prefix handling + fixes Ellipsis handling + Untokenizer backported bugfixes: - 5e6db313686c200da425a54d2e0c95fa40107b1d - 9dc3a36c849c15c227a8af218cfb215abe7b3c48 - 5b8d2c3af76e704926cf5915ad0e6af59a232e61 - e411b6629fb5f7bc01bec89df75737875ce6d8f5 - BPO-2495 + tokenizer doesn't crash on missing newline at the end of the stream (added \Z (end of string) to PseudoExtras) - BPO-16152 + `find_cookie` includes file name in error messages, if available + `find_cookie` raises SyntaxError on invalid encodings: BPO-14990 Improvements to lib2to3/pgen2/token.py: + taken from the current Lib/token.py + tokens renumbered to match Lib/token.py + `__all__` properly defined + ASYNC, AWAIT and BACKQUOTE exist under different numbers (100 + old number) + ELLIPSIS added + ENCODING added ---------- components: 2to3 (2.x to 3.x conversion tool), Library (Lib) messages: 315639 nosy: lukasz.langa priority: normal severity: normal status: open title: [lib2to3] Synchronize token.py and tokenize.py with the standard library versions: Python 3.8 _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue33338> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com