On Mar 23, 4:57 pm, abhi <abhigyan_agra...@in.ibm.com> wrote: > On Mar 23, 4:37 pm, "M.-A. Lemburg" <m...@egenix.com> wrote: > > > > > On 2009-03-23 11:50, abhi wrote: > > > > On Mar 23, 3:04 pm, "M.-A. Lemburg" <m...@egenix.com> wrote: > > > Thanks Marc, John, > > > With your help, I am at least somewhere. I re-wrote the code > > > to compare Py_Unicode and wchar_t outputs and they both look exactly > > > the same. > > > > #include<Python.h> > > > > static PyObject *unicode_helper(PyObject *self,PyObject *args){ > > > const char *name; > > > PyObject *sampleObj = NULL; > > > Py_UNICODE *sample = NULL; > > > wchar_t * w=NULL; > > > int size = 0; > > > int i; > > > > if (!PyArg_ParseTuple(args, "O", &sampleObj)){ > > > return NULL; > > > } > > > > // Explicitly convert it to unicode and get Py_UNICODE value > > > sampleObj = PyUnicode_FromObject(sampleObj); > > > sample = PyUnicode_AS_UNICODE(sampleObj); > > > printf("size of sampleObj is : %d\n",PyUnicode_GET_SIZE > > > (sampleObj)); > > > w = (wchar_t *) malloc((PyUnicode_GET_SIZE(sampleObj)+1)*sizeof > > > (wchar_t)); > > > size = PyUnicode_AsWideChar(sampleObj,w,(PyUnicode_GET_SIZE(sampleObj) > > > +1)*sizeof(wchar_t)); > > > printf("%d chars are copied to w\n",size); > > > printf("size of wchar_t is : %d\n", sizeof(wchar_t)); > > > printf("size of Py_UNICODE is: %d\n",sizeof(Py_UNICODE)); > > > for(i=0;i<PyUnicode_GET_SIZE(sampleObj);i++){ > > > printf("sample is : %c\n",sample[i]); > > > printf("w is : %c\n",w[i]); > > > } > > > return sampleObj; > > > } > > > > static PyMethodDef funcs[]={{"unicodeTest",(PyCFunction) > > > unicode_helper,METH_VARARGS,"test ucs2, ucs4"},{NULL}}; > > > > void initunicodeTest(void){ > > > Py_InitModule3("unicodeTest",funcs,""); > > > } > > > > This gives the following output when I pass "abc" as input: > > > > size of sampleObj is : 3 > > > 3 chars are copied to w > > > size of wchar_t is : 4 > > > size of Py_UNICODE is: 4 > > > sample is : a > > > w is : a > > > sample is : b > > > w is : b > > > sample is : c > > > w is : c > > > > So, both Py_UNICODE and wchar_t are 4 bytes and since it contains 3 > > > \0s after a char, printf or wprintf is only printing one letter. > > > I need to further process the data and those libraries will need the > > > data in UCS2 format (2 bytes), otherwise they fail. Is there any way > > > by which I can force wchar_t to be 2 bytes, or can I convert this UCS4 > > > data to UCS2 explicitly? > > > Sure: just use the appropriate UTF-16 codec for this. > > > /* Generic codec based encoding API. > > > object is passed through the encoder function found for the given > > encoding using the error handling method defined by errors. errors > > may be NULL to use the default method defined for the codec. > > > Raises a LookupError in case no encoder can be found. > > > */ > > > PyAPI_FUNC(PyObject *) PyCodec_Encode( > > PyObject *object, > > const char *encoding, > > const char *errors > > ); > > > encoding needs to be set to 'utf-16-le' for little endian, 'utf-16-be' > > for big endian. > > > -- > > Marc-Andre Lemburg > > eGenix.com > > > Professional Python Services directly from the Source (#1, Mar 23 2009)>>> > > Python/Zope Consulting and Support ... http://www.egenix.com/ > > >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ > > >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > > > ________________________________________________________________________ > > 2009-03-19: Released mxODBC.Connect 1.0.1 http://python.egenix.com/ > > > ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: > > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > > Registered at Amtsgericht Duesseldorf: HRB 46611 > > http://www.egenix.com/company/contact/ > > Thanks, but this is returning PyObject *, whereas I need value in some > variable which can be printed using wprintf() like wchar_t (having a > size of 2 bytes). If I again convert this PyObject to wchar_t or > PyUnicode, I go back to where I started. :) > > - > Abhigyan
Hi Marc, Is there any way to ensure that wchar_t size would always be 2 instead of 4 in ucs4 configured python? Googling gave me the impression that there is some logic written in PyUnicode_AsWideChar() which can take care of ucs4 to ucs2 conversion if sizes of Py_UNICODE and wchar_t are different. - Abhigyan -- http://mail.python.org/mailman/listinfo/python-list