Martin v. Löwis added the comment:

The reason the Unicode consortium made this list (Other_ID_Start) is that they 
want to promise 100% backwards compatibility: if some programming language had 
been using UAX#31, changes to the Unicode database might break existing code. 
To avoid this, UAX#31 guarantees 100% stability.

The reason Python uses it is because it uses UAX#31, with the minimum number of 
modifications. We really shouldn't be making arbitrary changes to it. If we 
would e.g. say that we drop these four characters now, the next Unicode version 
might add more characters to Other_ID_Start, and then we would have to say that 
we include some, but not all, characters from Other_ID_Start.

So if IDLE wants to reimplement the XID_Start and XID_Continue properties, it 
should do it correctly. Note that the proposed patch only manages to replicate 
the ID_Start and ID_Continue properties. For the XID versions, see

http://www.unicode.org/reports/tr31/#NFKC_Modifications

Unfortunately, the specification doesn't explain exactly how these 
modifications are performed. For item 1, I think it is:

Characters which are in ID_Start (because they count as letters) but their NFKC 
decomposition does not start with an ID_Start character (because it starts with 
a modifier instead) are removed in XID_Start

For item 2, they unfortunately don't list all characters that get excluded. For 
the one example that they do give, the reason is clear: U+037A (GREEK 
YPOGEGRAMMENI, category Lm) decomposes to U+0020 (SPACE) U+0345 (COMBINING 
GREEK YPOGEGRAMMENI). Having a space in an identifier is clearly out of the 
question. I assume similar problems occur with "certain Arabic presentation 
forms". I wish the consortium was more explicit as to what precise algorithms 
they use to derive their derived properties.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue21765>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to