On Wed, Oct 21, 2009 at 12:35:11PM EDT, Nobody wrote: [..]
> Characters outside the 16-bit range aren't supported on all builds. > They won't be supported on most Windows builds, as Windows uses 16-bit > Unicode extensively: I knew nothing about UTF-16 & friends before this thread. Best part of Unicode is that there are multiple encodings, right? ;-) Moot point on xterm anyway, since you'd be hard put to it to find a decent terminal font that covers anything outside the BMP. > Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit > (Intel)] on win32 > >>> unichr(0x10000) > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > ValueError: unichr() arg not in range(0x10000) (narrow Python build) > > Note that narrow builds do understand names outside of the BMP, and > generate surrogate pairs for them: > > >>> u'\N{LINEAR B SYLLABLE B008 A}' > u'\U00010000' > >>> len(_) > 2 > > Whether or not using surrogates in this context is a good idea is open to > debate. What's the advantage of a multi-wchar string over a multi-byte > string? I don't understand this last remark, but since I'm only a GNU/Linux hobbyist, I guess it doesn't make much difference. Thanks for the code snippet and comments. CJ -- http://mail.python.org/mailman/listinfo/python-list