On Tue, Apr 1, 2014 at 7:44 AM, Chris Angelico <ros...@gmail.com> wrote: > On Wed, Apr 2, 2014 at 12:33 AM, Ned Batchelder <n...@nedbatchelder.com> > wrote: >> Maybe I'm misunderstanding the discussion... It seems like we're talking >> about a hypothetical definition of identifiers based on Unicode character >> categories, but there's no need: Python 3 has defined precisely that. From >> the docs >> (https://docs.python.org/3/reference/lexical_analysis.html#identifiers): >> > > "Python 3.0 introduces **additional characters** from outside the > ASCII range" - emphasis mine. > > Python currently has - at least, per that documentation - a hybrid > system with ASCII characters defined in the classic way, and non-ASCII > characters defined by their Unicode character classes. I'm talking > about a system that's _purely_ defined by Unicode character classes. > It may turn out that the class list exactly compasses the ASCII > characters listed, though, in which case you'd be right: it's not > hypothetical.
The only ASCII character not encompassed is that _ is explicitly permitted to start an identifier (for obvious reasons) whereas characters in Pc are more generally only permitted to continue identifiers. There are also explicit lists of extra permitted characters in PropList.txt for backward compatibility (once a character is permitted, it should remain permitted even if its Unicode category changes). There are currently 4 extra starting characters and 12 extra continuing characters, but none of these are ASCII. -- https://mail.python.org/mailman/listinfo/python-list