[issue8781] 32-bit wchar_t doesn't need to be unsigned to be usable (I think)

Daniel Stutzbach Fri, 21 May 2010 05:44:09 -0700

New submission from Daniel Stutzbach <dan...@stutzbachenterprises.com>:


If ./configure detects that the system's wchar_t type is compatible, it will 
define "#define PY_UNICODE_TYPE wchar_t" and enable certain optimizations when 
converting between Py_UNICODE and wchar_t (i.e., it can just do a memcpy).

Right now, ./configure considers wchar_t to be compatible if it is the same 
bit-width as Py_UNICODE and if wchar_t is unsigned.  In practice, that means 
Python only uses wchar_t on Windows, which uses an unsigned 16-bit wchar_t.  On 
Linux, wchar_t is 32-bit and signed.

In the original Unicode implementation for Python, Py_UNICODE was always 
16-bit.  I believe the "unsigned" requirement heralds back to that time.  A 
32-bit wchar_t gives us plenty of space to hold the maximum Unicode code point 
of 0x10FFFF, regardless of whether wchar_t is signed or unsigned.

I believe the condition could be relaxed to the following:
- wchar_t must be the same bit-width as Py_UNICODE, and
- if wchar_t is 16-bit, it must be unsigned

That would allow a UCS4 Python to use wchar_t on Linux.

I experimented by manually tweaking my pyconfig.h to treat Linux's signed 
32-bit wchar_t as compatible.  The unit test suite encountered no problems.

However, it's quite possible that I'm missing some important detail here.  
Someone familiar with the guts of Python's Unicode implementation  will 
presumably have a much better idea of whether I have this right or not. ;-)

----------
components: Interpreter Core, Unicode
messages: 106235
nosy: stutzbach
priority: normal
severity: normal
stage: needs patch
status: open
title: 32-bit wchar_t doesn't need to be unsigned to be usable (I think)
type: performance
versions: Python 3.2

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue8781>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue8781] 32-bit wchar_t doesn't need to be unsigned to be usable (I think)

Reply via email to