On Sun, 22 Jan 2017 07:21 am, Pete Forman wrote: > Marko Rauhamaa <ma...@pacujo.net> writes: > >>> py> low = '\uDC37' >> >> That should raise a SyntaxError exception. > > Quite. My point was that with older Python on a narrow build (Windows > and Mac) you need to understand that you are using UTF-16 rather than > Unicode.
But you're *not* using UTF-16, at least not proper UTF-16, in older narrow builds. If you were, then Unicode strings u'...' containing surrogate pairs would be treated as supplementary single code points, but they aren't. unichr() doesn't support supplementary code points in narrow builds: [steve@ando ~]$ python2.7 -c "print len(unichr(0x10900))" Traceback (most recent call last): File "<string>", line 1, in <module> ValueError: unichr() arg not in range(0x10000) (narrow Python build) and even if you sneak a supplementary code point in, it is treated wrongly: [steve@ando ~]$ python2.7 -c "print len(u'\U00010900')" 2 So Python narrow builds are more like a bastard hybrid of UCS-2 and UTF-16. -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list