Hi, Thank you for your answer. That confirms what Martin v. LÃwis says. You can choose between UCS-2 or UCS-4 for internal unicode representation.
Francis Girard Le mardi 8 Mars 2005 00:44, Jeff Epler a ÃcritÂ: > On Mon, Mar 07, 2005 at 11:56:57PM +0100, Francis Girard wrote: > > BTW, the python "unicode" built-in function documentation says it returns > > a "unicode" string which scarcely means something. What is the python > > "internal" unicode encoding ? > > The language reference says farily little about unicode objects. Here's > what it does say: [http://docs.python.org/ref/types.html#l2h-48] > Unicode > The items of a Unicode object are Unicode code units. A Unicode > code unit is represented by a Unicode object of one item and can > hold either a 16-bit or 32-bit value representing a Unicode > ordinal (the maximum value for the ordinal is given in > sys.maxunicode, and depends on how Python is configured at > compile time). Surrogate pairs may be present in the Unicode > object, and will be reported as two separate items. The built-in > functions unichr() and ord() convert between code units and > nonnegative integers representing the Unicode ordinals as > defined in the Unicode Standard 3.0. Conversion from and to > other encodings are possible through the Unicode method encode > and the built-in function unicode(). > > In terms of the CPython implementation, the PyUnicodeObject is laid out > as follows: > typedef struct { > PyObject_HEAD > int length; /* Length of raw Unicode data in buffer > */ Py_UNICODE *str; /* Raw Unicode buffer */ > long hash; /* Hash value; -1 if not set */ > PyObject *defenc; /* (Default) Encoded version as Python > string, or NULL; this is used for > implementing the buffer protocol */ > } PyUnicodeObject; > Py_UNICODE is some "C" integral type that can hold values up to > sys.maxunicode (probably one of unsigned short, unsigned int, unsigned > long, wchar_t). > > Jeff -- http://mail.python.org/mailman/listinfo/python-list