[issue1693050] \w not helpful for non-Roman scripts

Terry J. Reedy Wed, 14 Mar 2018 17:34:30 -0700

Terry J. Reedy <tjre...@udel.edu> added the comment:

Whatever I may have said before, I favor supporting the Unicode standard for 
\w, which is related to the standard for identifiers.


This is one of 2 issues about \w being defined too narrowly.  I am somewhat 
arbitrarily closing this as a duplicate of #12731 (fewer digits ;-).

There are 3 issues about tokenize.tokenize failing on valid identifiers, 
defined as \w sequences whose first char is an identifier itself (and therefore 
a start char).  In msg313814 of #32987, Serhiy indicates which start and 
continue identifier characters are matched by \W for re and regex.  I am 
leaving #24194 open as the tokenizer name issue.

----------
resolution:  -> duplicate
stage:  -> resolved
status: open -> closed
superseder:  -> tokenize yield an ERRORTOKEN if an identifier uses 
Other_ID_Start or Other_ID_Continue

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue1693050>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1693050] \w not helpful for non-Roman scripts

Reply via email to