New submission from Justin ARthur <justinart...@gmail.com>: Python 3 code with an identifier that has a non-spacing mark in it does not get tokenized by lib2to3 and will result in an exception thrown in the parsing process.
Parsing the attached file (badvar.py), results in `ParseError: bad token: type=58, value='̇', context=('', (1, 1))` This happens because the Name pattern regular expression in lib2to3 is `r'\w+'` and the word character class doesn't contain non-spacing marks (and possible other [continuation characters allowed in Python 3 identifiers](https://docs.python.org/3/reference/lexical_analysis.html#identifiers)). (reported by energizer in the Python IRC channel) ---------- components: 2to3 (2.x to 3.x conversion tool), Library (Lib) files: badvar.py messages: 351153 nosy: JustinTArthur priority: normal severity: normal status: open title: lib2to3 doesn't parse Python 3 identifiers containing non-spacing marks versions: Python 3.5, Python 3.6, Python 3.7, Python 3.8 Added file: https://bugs.python.org/file48592/badvar.py _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue38032> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com