On Wed, 21 Oct 2009 05:16:56 -0400, Chris Jones wrote:

>> > Where are the literals (i.e. u'\N{DEGREE SIGN}') defined? 
>> 
>> You can get them from the unicodedata module, e.g.:
>> 
>>      import unicodedata
>>      for i in xrange(0x10000):
>>        n = unicodedata.name(unichr(i),None)
>>        if n is not None:
>>          print i, n
> 
> Python rocks!
> 
> Just curious, why did you choose to set the upper boundary at 0xffff?

Characters outside the 16-bit range aren't supported on all builds. They
won't be supported on most Windows builds, as Windows uses 16-bit Unicode
extensively:

        Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit 
(Intel)] on
        win32
        >>> unichr(0x10000)
        Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
        ValueError: unichr() arg not in range(0x10000) (narrow Python build)

Note that narrow builds do understand names outside of the BMP, and
generate surrogate pairs for them:

        >>> u'\N{LINEAR B SYLLABLE B008 A}'
        u'\U00010000'
        >>> len(_)
        2

Whether or not using surrogates in this context is a good idea is open to
debate. What's the advantage of a multi-wchar string over a multi-byte
string?

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to