[issue12753] \N{...} neglects formal aliases and named sequences from Unicode charnames namespace

Ezio Melotti Fri, 30 Sep 2011 01:59:19 -0700

Ezio Melotti <[email protected]> added the comment:

The attached patch changes Tools/unicode/makeunicodedata.py to create a list of 
names and codepoints taken from 
http://www.unicode.org/Public/6.0.0/ucd/NameAliases.txt and adds it to 
Modules/unicodename_db.h.
During the lookup the _getcode function at Modules/unicodedata.c:1055 loops 
over the 11 aliases and checks if any of those match.
The patch also includes tests for both unicodedata.lookup and \N{}.


I'm not sure this is the best way to implement this, and someone will probably 
want to review and tweak both the approach and the C code, but it works fine:
>>> "\N{LATIN CAPITAL LETTER GHA}"
'Ƣ'
>>> import unicodedata
>>> unicodedata.lookup("LATIN CAPITAL LETTER GHA")
'Ƣ'
>>> "\N{LATIN CAPITAL LETTER OI}"
'Ƣ'
>>> unicodedata.lookup("LATIN CAPITAL LETTER OI")
'Ƣ'

The patch doesn't include changes for NamedSequences.txt.

----------
keywords: +patch
nosy: +lemburg, loewis
Added file: http://bugs.python.org/file23273/issue12753.diff

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue12753>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12753] \N{...} neglects formal aliases and named sequences from Unicode charnames namespace

Reply via email to