On Sunday, September 7, 2014 10:33:26 PM UTC+5:30, Steven D'Aprano wrote: > MRAB wrote:
> > I don't think you should be saying that it stores the string in Latin-1 > > or UTF-16 because that might suggest that they are encoded. They aren't. > Of course they are encoded. Memory consists of bytes, not Unicode code > points, which are abstract numbers representing characters (and other > things). You can't store "ξ" (U+03BE) in memory, you can only store a > particular representation of that "ξ" in bytes, and that representation is > called an encoding. Of course you can create whatever representation you > like, or you can use an established encoding rather than re-invent the > wheel. Here are four established encodings which support that code point, > and the bytes that are used: > py> u'ξ'.encode('iso-8859-7') > '\xee' > py> u'ξ'.encode('utf-8') > '\xce\xbe' > py> u'ξ'.encode('utf-16be') > '\x03\xbe' > py> u'ξ'.encode('utf-32be') > '\x00\x00\x03\xbe' Dunno about philosophical questions -- especially unicode :-) What I can see (python 3) which is I guess what MRAB was pointing out: >>> "".encode <built-in method encode of str object at 0x7f3955da3848> >>> "".decode Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'str' object has no attribute 'decode' >>> b"".decode <built-in method decode of bytes object at 0x7f39549fda08> >>> b"".encode Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'bytes' object has no attribute 'encode' >>> -- https://mail.python.org/mailman/listinfo/python-list