On Feb 6, 9:24 pm, Chris Rebert <c...@rebertia.com> wrote: > On Fri, Feb 6, 2009 at 1:49 AM, Kalyankumar Ramaseshan > > <soft_sm...@yahoo.com> wrote: > > > Hi, > > > Excuse me if this is a repeat question! > > > I just wanted to know how are strings represented in python? > > > I need to know in terms of: > > > a) Strings are stored as UTF-16 (LE/BE) or UTF-32 characters?
Neither. > > IIRC, Depends on what the build settings were when CPython was > compiled. UTF-16 is the default. Unicode strings are held as arrays of 16-bit numbers or 32-bit numbers [of which only 21 are used]. If you must use an acronym, use UCS-2 or UCS-4. The UTF-n siblings are *external* representations. 2.x: a_unicode_object.decode('UTF-16') -> an_str_object 3.x: an_str_object.decode('UTF-16') -> a_bytes_object By the way, has anyone come up with a name for the shifting effect observed above on str, and also with repr, range, and the iter* family? If not, I suggest that the language's association with the best of English humour be widened so that it be dubbed the "Mad Hatter's Tea Party" effect. -- http://mail.python.org/mailman/listinfo/python-list