New submission from Daniel Stutzbach <dan...@stutzbachenterprises.com>:
If ./configure detects that the system's wchar_t type is compatible, it will define "#define PY_UNICODE_TYPE wchar_t" and enable certain optimizations when converting between Py_UNICODE and wchar_t (i.e., it can just do a memcpy). Right now, ./configure considers wchar_t to be compatible if it is the same bit-width as Py_UNICODE and if wchar_t is unsigned. In practice, that means Python only uses wchar_t on Windows, which uses an unsigned 16-bit wchar_t. On Linux, wchar_t is 32-bit and signed. In the original Unicode implementation for Python, Py_UNICODE was always 16-bit. I believe the "unsigned" requirement heralds back to that time. A 32-bit wchar_t gives us plenty of space to hold the maximum Unicode code point of 0x10FFFF, regardless of whether wchar_t is signed or unsigned. I believe the condition could be relaxed to the following: - wchar_t must be the same bit-width as Py_UNICODE, and - if wchar_t is 16-bit, it must be unsigned That would allow a UCS4 Python to use wchar_t on Linux. I experimented by manually tweaking my pyconfig.h to treat Linux's signed 32-bit wchar_t as compatible. The unit test suite encountered no problems. However, it's quite possible that I'm missing some important detail here. Someone familiar with the guts of Python's Unicode implementation will presumably have a much better idea of whether I have this right or not. ;-) ---------- components: Interpreter Core, Unicode messages: 106235 nosy: stutzbach priority: normal severity: normal stage: needs patch status: open title: 32-bit wchar_t doesn't need to be unsigned to be usable (I think) type: performance versions: Python 3.2 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue8781> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com