Hi all, I'd like to ask about the usage of unicode data on a narrow python build. Unicode string literals \N{name} work even without the (explicit) import of unicodedata and it correctly handles also the "wider" unicodes planes - over FFFF
>>> u"\N{LATIN SMALL LETTER E}" u'e' >>> u"\N{GOTHIC LETTER AHSA}" u'\U00010330' The unicode data functions works analogous in the basic plane, but behave differently otherwise: >>> unicodedata.lookup("LATIN SMALL LETTER E") u'e' >>> unicodedata.lookup("GOTHIC LETTER AHSA") u'\u0330' (0001 gets trimmed) Is it a bug in unicodedata, or is this the expected behaviour on a narrow build? Another problem I have is to access the "characters" and their properties by the respective codepoints: under FFFF it is possible, to use unichr(), which isn't valid for higher valules on a narrow build It is possible to derive the codepoint from the surrogate pair, which would be usable also for wider codepoints. Currently, I'm using a kind of parallel database for some unicode ranges above FFFF, but I don't think, this is the most effective way. I actually found something similar at http: / / inamidst.com/phenny/modules/codepoint.py using directly the UnicodeData.txt; but I was wondering, If there is a simpler way for doing that; it seems obvious, that the data are present, if it could be used for constucting unicode literals. Any hints are welcome, thanks. vbr -- http://mail.python.org/mailman/listinfo/python-list