MRAB wrote: > I don't think you should be saying that it stores the string in Latin-1 > or UTF-16 because that might suggest that they are encoded. They aren't.
Of course they are encoded. Memory consists of bytes, not Unicode code points, which are abstract numbers representing characters (and other things). You can't store "ξ" (U+03BE) in memory, you can only store a particular representation of that "ξ" in bytes, and that representation is called an encoding. Of course you can create whatever representation you like, or you can use an established encoding rather than re-invent the wheel. Here are four established encodings which support that code point, and the bytes that are used: py> u'ξ'.encode('iso-8859-7') '\xee' py> u'ξ'.encode('utf-8') '\xce\xbe' py> u'ξ'.encode('utf-16be') '\x03\xbe' py> u'ξ'.encode('utf-32be') '\x00\x00\x03\xbe' -- Steven -- https://mail.python.org/mailman/listinfo/python-list