[issue24194] tokenize fails on some Other_ID_Start or Other_ID_Continue

Terry J. Reedy Wed, 14 Mar 2018 17:56:20 -0700

Terry J. Reedy <[email protected]> added the comment:

I closed #1693050 as a duplicate of #12731 (the /w issue).  I left #9712 closed 
and closed #32987 and marked both as duplicates of this.


In msg313814 of the latter, Serhiy indicates which start and continue 
identifier characters are currently matched by \W for re and regex.  He gives 
there a fix for this that he says requires the /w issue to be fixed. It is 
similar to the posted patch.  He says that without \w fixed, another 2000+ 
chars need to be added.  Perhaps the v0 patch needs more tests (I don't know.)

He also says that re support for properties, #12734,  would make things even 
better.

Three of the characters in the patch are too obscure for Firefox on Window2 and 
print as boxes.  Some others I do not recognize.  And I could not type any of 
them.  I thought we had a policy of using \u or \U escapes even in tests to 
avoid such problems.  (I notice that there are already non-ascii chars in the 
context.)

----------
nosy: +terry.reedy
title: tokenize yield an ERRORTOKEN if an identifier uses Other_ID_Start or 
Other_ID_Continue -> tokenize fails on some Other_ID_Start or Other_ID_Continue
versions: +Python 3.7, Python 3.8 -Python 3.5

_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue24194>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue24194] tokenize fails on some Other_ID_Start or Other_ID_Continue

Reply via email to