Python comes in two flavors. In one, sys.maxunicode is 65535 and Py_UNICODE is a 16-bit type, and in the other, sys.maxunicode is 1114111 and Py_UNICODE is a 32-bit type. This is selected at compile time, and RedHat has chosen in some versions to compile for sys.maxunicode == 1114111.
By using the Py_UNICODE typedef, you generally don't have to worry about this distinction. Here is some code that works on RedHat 9's Python 2.2 (sys.maxunicode == 1114111) and a manually built Python 2.3 (sys.maxunicode == 65535). #include <Python.h> PyObject *f(PyObject *self, PyObject *o) { if(PyString_Check(o)) { char *c = PyString_AS_STRING(o); int sz = PyString_GET_SIZE(o); int i; printf(" Byte string: "); for(i=0; i<sz; i++) { printf("%4x ", c[i]); } printf("\n"); } else if (PyUnicode_Check(o)) { Py_UNICODE *c = PyUnicode_AS_UNICODE(o); int sz = PyUnicode_GET_SIZE(o); int i; printf("Unicode string: "); for(i=0; i<sz; i++) { printf("%4x ", c[i]); } printf("\n"); } Py_INCREF(Py_None); return Py_None; } PyMethodDef d[] = { { "f", (PyCFunction)f, METH_O, "Print out the values in a string from C" }, { NULL, NULL, 0, NULL } }; void initunidemo(void) { Py_InitModule("unidemo", d); } $ # build unidemo for python2.2 $ python2.2 -c 'import unidemo, sys; print sys.maxunicode; unidemo.f(u"\N{copyright sign}\N{greek capital letter sigma}")' 1114111 Unicode string: a9 3a3 $ # rebuild unidemo for python2.3 $ python2.3 -c 'import unidemo, sys; print sys.maxunicode; unidemo.f(u"\N{copyright sign}\N{greek capital letter sigma}")' 65535 Unicode string: a9 3a3 Jeff
pgpDEx2W2IVa0.pgp
Description: PGP signature
-- http://mail.python.org/mailman/listinfo/python-list