[issue38032] lib2to3 doesn't parse Python 3 identifiers containing non-spacing marks

Justin ARthur Wed, 04 Sep 2019 16:22:47 -0700

New submission from Justin ARthur <[email protected]>:

Python 3 code with an identifier that has a non-spacing mark in it does not get 
tokenized by lib2to3 and will result in an exception thrown in the parsing 
process.


Parsing the attached file (badvar.py), results in `ParseError: bad token: 
type=58, value='̇', context=('', (1, 1))`

This happens because the Name pattern regular expression in lib2to3 is `r'\w+'` 
and the word character class doesn't contain non-spacing marks (and possible 
other [continuation characters allowed in Python 3 
identifiers](https://docs.python.org/3/reference/lexical_analysis.html#identifiers)).

(reported by energizer in the Python IRC channel)

----------
components: 2to3 (2.x to 3.x conversion tool), Library (Lib)
files: badvar.py
messages: 351153
nosy: JustinTArthur
priority: normal
severity: normal
status: open
title: lib2to3 doesn't parse Python 3 identifiers containing non-spacing marks
versions: Python 3.5, Python 3.6, Python 3.7, Python 3.8
Added file: https://bugs.python.org/file48592/badvar.py

_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue38032>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue38032] lib2to3 doesn't parse Python 3 identifiers containing non-spacing marks

Reply via email to