On Sunday, September 7, 2014 11:38:41 PM UTC+5:30, Steven D'Aprano wrote: > Rustom Mody wrote:
> > On Sunday, September 7, 2014 10:33:26 PM UTC+5:30, Steven D'Aprano wrote: > >> MRAB wrote: > >> > I don't think you should be saying that it stores the string in Latin-1 > >> > or UTF-16 because that might suggest that they are encoded. They > >> > aren't. > >> Of course they are encoded. Memory consists of bytes, not Unicode code > >> points, [...] > > Dunno about philosophical questions -- especially unicode :-) > > What I can see (python 3) which is I guess what MRAB was pointing out: > >>>> "".encode > >>>> "".decode > > Traceback (most recent call last): > > AttributeError: 'str' object has no attribute 'decode' > What's your point? I'm talking about the implementation of how strings are > stored in memory, not what methods the str class provides. The methods (un)available reflect what're the (in)valid operations on the type: Strings The items of a string object are Unicode code units. Conversion from and to other encodings are possible through the string method encode(). Bytes A bytes object is an immutable array. The items are 8-bit bytes, represented by integers in the range 0 <= x < 256. Bytes literals (like b'abc' and the built-in function bytes() can be used to construct bytes objects. Also, bytes objects can be decoded to strings via the decode() method. >From >https://docs.python.org/3.1/reference/datamodel.html#the-standard-type-hierarchy IOW I interpret MRAB's statement that strings should not be thought of as encoded because they consist of abstract code-points, seems to me (a unicode-ignoramus!) a reasonable outlook -- https://mail.python.org/mailman/listinfo/python-list