Re: Unicode problem in ucs4

Martin v. Löwis Mon, 23 Mar 2009 17:00:40 -0700

> So, both Py_UNICODE and wchar_t are 4 bytes and since it contains 3
> \0s after a char, printf or wprintf is only printing one letter.


No. printf indeed will see a terminating character. However, wprintf
should correctly know that a wchar_t has four bytes per character,
and print it correctly. Make sure to use %ls to print wchar_t arrays;
%s would print multi-byte character strings.

> I need to further process the data and those libraries will need the
> data in UCS2 format (2 bytes), otherwise they fail.

Are you absolutely sure about that? Why does that library expect
UCS-2, when you system's wchar_t is four bytes?

In any case, do what MAL told you: use the UCS-2 codec to convert
the Unicode string to a 2-bytes-per-char byte string. The PyObject
you get from the conversion is a byte string object; use
PyString_AsStringAndSize to get to the actual bytes.

Regards,
Martin
--
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode problem in ucs4

Reply via email to