On Tue, 07 Mar 2017 14:05:15 -0800, John Nagle wrote: > How do I test if a Python 2.7.8 build was built for 32-bit Unicode?
sys.maxunicode will be 1114111 if it is a "wide" (32-bit) build and 65535 if it is a "narrow" (16-bit) build. You can double-check with: unichr(0x10FFFF) # will raise ValueError in a narrow build len(u'\U0010FFFF') # return 1 in a wide build, or 2 in a narrow build but the maxunicode test is the right way to do it. > (I'm dealing with shared hosting, and I'm stuck with their provided > versions.) > > If I give this to Python 2.7.x: > > sy = u'\U0001f60f' > > len(sy) is 1 on a Ubuntu 14.04LTS machine, but 2 on the Red Hat shared > hosting machine. I assume "1" indicates 32-bit Unicode capability, and > "2" indicates 16-bit. > It looks like Python 2.x in 16-bit mode is using a UTF-16 pair > encoding, like Java. Is that right? Correct. > Is it documented somewhere? https://docs.python.org/2/library/sys.html#sys.maxunicode https://docs.python.org/3/library/sys.html#sys.maxunicode Here's the PEP that introduced the distinction in the first place: https://www.python.org/dev/peps/pep-0261/ And here's the PEP that removes the distinction once and for all (at least in CPython): https://www.python.org/dev/peps/pep-0393/ I know the narrow/wide distinction was documented in the build instructions for when you compiled Python from source; that's obsolete since 3.3. I believe the compiler options were --enable-unicode=ucs4 and --enable-unicode=ucs2 (but don't quote me on that). -- Steve -- https://mail.python.org/mailman/listinfo/python-list