Mark Dickinson <[EMAIL PROTECTED]> added the comment:

I'm now very confused.

In trying to follow things of type wchar_t* around the Python source, I 
discovered PyUnicode_FromWideChar in unicodebject.c.  For OS X, the 
conversion lands in the following code, where w is the incoming WideChar 
array, declared as wchar_t *.

        register Py_UNICODE *u;
        register Py_ssize_t i;
        u = PyUnicode_AS_UNICODE(unicode);
        for (i = size; i > 0; i--)
            *u++ = *w++;

But this looks wrong:  on OS X, sizeof(wchar_t) is 4 and I think w is 
encoded in UTF-32.  So I was expecting to see some kind of explicit 
conversion from UTF-32 to UCS-2 here.  Instead, it looks as though the 
incoming values are implicitly truncated from 32 bits to 16.  Doesn't this 
do the wrong thing for characters outside the BMP?

Should I open an issue for this, or am I simply misunderstanding?

_______________________________________
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue4388>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to