"Brendan Miller" <catph...@catphive.net> wrote in message
news:aanlkti=2f3l++398st-16mpes8wzfblbu+qa8ztpa...@mail.gmail.com...
2010/9/29 Lawrence D'Oliveiro <l...@geek-central.gen.new_zealand>:
In message <mailman.1132.1285714474.29448.python-l...@python.org>,
Brendan
Miller wrote:
It seems that characters not in the ascii subset of UTF-8 are
discarded by c_char_p during the conversion ...
Not a chance.
... or at least they don't print out when I go to print the string.
So it seems there’s a problem on the printing side. What happens when
you
construct a UTF-8-encoded string directly in Python and try printing it
the
same way?
Doing this seems to confirm something is broken in ctypes w.r.t. UTF-8...
if I enter:
str = "日本語のテスト"
Then:
print str
日本語のテスト
However, when I create a string buffer, pass it into my c++ code, and
write the same UTF-8 string into it, python seems to discard pretty
much all the text. The same code works for pure ascii strings.
Python code:
_std_string_size = _lib_mbxclient.std_string_size
_std_string_size.restype = c_long
_std_string_size.argtypes = [c_void_p]
_std_string_copy = _lib_mbxclient.std_string_copy
_std_string_copy.restype = None
_std_string_copy.argtypes = [c_void_p, POINTER(c_char)]
# This function works for ascii, but breaks on strings with UTF-8!
def std_string_to_string(str_ptr):
buf = create_string_buffer(_std_string_size(str_ptr))
_std_string_copy(str_ptr, buf)
return buf.raw
C++ code:
extern "C"
long std_string_size(string* str)
{
return str->size();
}
extern "C"
void std_string_copy(string* str, char* buf)
{
std::copy(str->begin(), str->end(), buf);
}
I didn't see what OS you are using, but I fleshed out your example code and
have a working example for Windows. Below is the code for the DLL and
script:
--------- x.cpp [cl /LD /EHsc /W4
x.cpp] ----------------------------------------------------
#include <string>
#include <algorithm>
using namespace std;
extern "C" __declspec(dllexport) long std_string_size(string* str)
{
return str->size();
}
extern "C" __declspec(dllexport) void std_string_copy(string* str, char*
buf)
{
std::copy(str->begin(), str->end(), buf);
}
extern "C" __declspec(dllexport) void* make(const char* s)
{
return new string(s);
}
extern "C" __declspec(dllexport) void destroy(void* s)
{
delete (string*)s;
}
---- x.py ---------------------------------------------------------
# coding: utf8
from ctypes import *
_lib_mbxclient = CDLL('x')
_std_string_size = _lib_mbxclient.std_string_size
_std_string_size.restype = c_long
_std_string_size.argtypes = [c_void_p]
_std_string_copy = _lib_mbxclient.std_string_copy
_std_string_copy.restype = None
_std_string_copy.argtypes = [c_void_p, c_char_p]
make = _lib_mbxclient.make
make.restype = c_void_p
make.argtypes = [c_char_p]
destroy = _lib_mbxclient.destroy
destroy.restype = None
destroy.argtypes = [c_void_p]
# This function works for ascii, but breaks on strings with UTF-8!
def std_string_to_string(str_ptr):
buf = create_string_buffer(_std_string_size(str_ptr))
_std_string_copy(str_ptr, buf)
return buf.raw
s = make(u'我是美国人。'.encode('utf8'))
print std_string_to_string(s).decode('utf8')
------------------------------------------------------
And output (in Pythonwin...US Windows console doesn't support Chinese):
我是美国人。
I used c_char_p instead of POINTER(c_char) and added functions to create and
destroy a std::string for Python's use, but it is otherwise the same as your
code.
Hope this helps you work it out,
-Mark
--
http://mail.python.org/mailman/listinfo/python-list