Ezio Melotti added the comment: > _ID_FIRST_CATEGORIES = {"Lu", "Ll", "Lt", "Lm", "Lo", "Nl", > "Other_ID_Start"} > _ID_CATEGORIES = _ID_FIRST_CATEGORIES | {"Mn", "Mc", "Nd", "Pc", > "Other_ID_Continue"}
Note that "Other_ID_Start" and "Other_ID_Continue" are not categories -- they are properties -- and that unicodedata.category() won't return them, so adding them to these set won't have any effect. I don't think there's a way to check if chars have that property, but as I said in my previous message it's probably safe to ignore them (nothing will explode even in the unlikely case that those chars are used, right?). > def is_id_char(char): > return char in _ASCII_ID_CHARS or ( > ord(char) >= 128 and What's the reason for checking if the ord is >= 128? > category(normalize(char)[0]) in _ID_CATEGORIES > ) ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue21765> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com