Nick Coghlan added the comment:

Looking at issue 2382, I agree that's a different problem (I'm seeing the 
current misbehaviour even though everything is consistently encoded as UTF-8)

The main case we're interested in here is the PyUnicode_IsIdentifier one, so if 
we wanted to do better than "start or end of the token", we could introduce a 
new internal "_PyUnicode_FindNonIdentifier" that reported the position of the 
first non-identifier character (or -1 if it's a valid identifier).

Unfortunately, I'm not at all familiar with parsetok.c myself (my own work with 
the code generator has been from the AST on), so I don't have a ready answer 
for your other questions.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27582>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to