On Sunday 22 August 2010, it occurred to jmfauth to exclaim: > I think there is a small point here. > > >>> sys.version > > 2.7 (r27:82525, Jul 4 2010, 09:01:59) [MSC v.1500 32 bit (Intel)] > > >>> print unichr.__doc__ > > unichr(i) -> Unicode character > > Return a Unicode string of one character with ordinal i; 0 <= i <= > 0x10ffff. > > >>> # but > >>> unichr(0x10fff) > > Traceback (most recent call last): > File "<psi last command>", line 1, in <module> > ValueError: unichr() arg not in range(0x10000) (narrow Python > build)
This is very tricky ground. I consider the behaviour of unichr() to be wrong here. The user shouldn't have to care much about UTF-16 and the difference between wide and narrow Py_UNICODDE builds. In fact, in Python 3.1, this behaviour has changed: on a narrow Python 3 build, chr(0x10fff) == '\ud803\udfff' == '\U00010fff'. Now, the Python 2 behaviour can't be fixed [1] -- it was specified in PEP 261 [2], which means it was pretty much set in stone. Then, it was deemed more important for unichr() to always return a length-one string that for it to work with wide characters. And then add pretty half-arsed utf-16 support... The doc string could be changed for narrow Python builds. I myself don't think docstrings should change depending on build options like this -- it could be amended to document the different behaviours here. Note that the docs [3] already include this information. If you want to, feel free to report a bug at http://bugs.python.org/ > Note: > > I find > 0x0 <= i <= 0xffff > more logical than > 0 <= i <= 0xffff > > (orange-apple comparaison) Would a zero by any other name not look as small? Honestly, I myself find it nonsensical to qualify 0 by specifying a base, unless you go all the way and represent the full uint16_t by saying 0x0000 <= i <= 0xffff - Thomas [1] http://bugs.python.org/issue1057588 [2] http://www.python.org/dev/peps/pep-0261/ [3] http://docs.python.org/library/functions.html#unichr -- http://mail.python.org/mailman/listinfo/python-list