Ezio Melotti <ezio.melo...@gmail.com> added the comment: The attached patch changes Tools/unicode/makeunicodedata.py to create a list of names and codepoints taken from http://www.unicode.org/Public/6.0.0/ucd/NameAliases.txt and adds it to Modules/unicodename_db.h. During the lookup the _getcode function at Modules/unicodedata.c:1055 loops over the 11 aliases and checks if any of those match. The patch also includes tests for both unicodedata.lookup and \N{}.
I'm not sure this is the best way to implement this, and someone will probably want to review and tweak both the approach and the C code, but it works fine: >>> "\N{LATIN CAPITAL LETTER GHA}" 'Ƣ' >>> import unicodedata >>> unicodedata.lookup("LATIN CAPITAL LETTER GHA") 'Ƣ' >>> "\N{LATIN CAPITAL LETTER OI}" 'Ƣ' >>> unicodedata.lookup("LATIN CAPITAL LETTER OI") 'Ƣ' The patch doesn't include changes for NamedSequences.txt. ---------- keywords: +patch nosy: +lemburg, loewis Added file: http://bugs.python.org/file23273/issue12753.diff _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue12753> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com