On Wed, 21 Oct 2009 05:16:56 -0400, Chris Jones wrote: >> > Where are the literals (i.e. u'\N{DEGREE SIGN}') defined? >> >> You can get them from the unicodedata module, e.g.: >> >> import unicodedata >> for i in xrange(0x10000): >> n = unicodedata.name(unichr(i),None) >> if n is not None: >> print i, n > > Python rocks! > > Just curious, why did you choose to set the upper boundary at 0xffff?
Characters outside the 16-bit range aren't supported on all builds. They won't be supported on most Windows builds, as Windows uses 16-bit Unicode extensively: Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)] on win32 >>> unichr(0x10000) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: unichr() arg not in range(0x10000) (narrow Python build) Note that narrow builds do understand names outside of the BMP, and generate surrogate pairs for them: >>> u'\N{LINEAR B SYLLABLE B008 A}' u'\U00010000' >>> len(_) 2 Whether or not using surrogates in this context is a good idea is open to debate. What's the advantage of a multi-wchar string over a multi-byte string? -- http://mail.python.org/mailman/listinfo/python-list