Hi, thanks for the answer, > From: Gabriel Genellina <[EMAIL PROTECTED]> > Subj: Re: unicode data - accessing codepoints > FFFF on narrow python builts > Datum: 18.4.2007 21:33:11 > ---------------------------------------- > > py> x=u"\N{GOTHIC LETTER AHSA}" > py> ord(x) > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > TypeError: ord() expected a character, but string of length 2 found > py> unicodedata.name(x) > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > TypeError: need a single Unicode character as parameter > py> len(x) > 2 > py> list(x) > [u'\ud800', u'\udf30']
> > That looks like UTF-16 (?) but seen as two characters instead of one. > Probably in a 32bits build Python should refuse to use such character (and > limit Unicode support to the basic plane?) (or not?) (if not, what's the > point of sys.maxunicode?) (enough parenthesis for now). > > -- > Gabriel Genellina > Yes, this is a UTF-16 surrogate pair, which is, as far as I know the usual way the characters outside the basic plane are handled on narrow python builds. There are some problems with it, but most things (I need) with non-basic plane characters can be done this way (GUI display, utf-8 text saving) - thus I wouldn't be happy, if this support were removed. The problem is the access to unicodedata, which requires "a string of length 1"; I thought, it could also accept the codepoint number, but it doesn't seem to be possible. Thanks again. vbr - Vlastimil Brom -- http://mail.python.org/mailman/listinfo/python-list