Terry J. Reedy <tjre...@udel.edu> added the comment: Whatever I may have said before, I favor supporting the Unicode standard for \w, which is related to the standard for identifiers.
This is one of 2 issues about \w being defined too narrowly. I am somewhat arbitrarily closing this as a duplicate of #12731 (fewer digits ;-). There are 3 issues about tokenize.tokenize failing on valid identifiers, defined as \w sequences whose first char is an identifier itself (and therefore a start char). In msg313814 of #32987, Serhiy indicates which start and continue identifier characters are matched by \W for re and regex. I am leaving #24194 open as the tokenizer name issue. ---------- resolution: -> duplicate stage: -> resolved status: open -> closed superseder: -> tokenize yield an ERRORTOKEN if an identifier uses Other_ID_Start or Other_ID_Continue _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue1693050> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com