On Mon, Sep 8, 2014 at 12:52 AM, MRAB <pyt...@mrabarnett.plus.com> wrote: > I don't think you should be saying that it stores the string in Latin-1 > or UTF-16 because that might suggest that they are encoded. They aren't.
Except that they are. RAM stores bytes [1], so by definition everything that's in memory is encoded. You can't store a list in memory; what you store is a set of bits which represent some metadata and a bunch of pointers. You can't store a non-integer in memory, so you use some kind of efficient packed system like IEEE 754. You can't even store an integer without using some kind of encoding, most likely by packing it into some number of bytes and laying those bytes out either smallest first or largest first. So yes, CPython 3.3 stores strings encoded Latin-1, UCS-2 [2], or UCS-4. The Python string *is* a sequence of characters, but it's *stored* as a sequence of bytes in one of those encodings. (And other Pythons may not use the same encodings. MicroPython uses UTF-8 internally, which gives it *very* different indexing performance.) ChrisA [1] On modern systems it stores larger units, probably 64-bit or 128-bit hunks, but whatever. Same difference. [2] As Steven says, UTF-16 or UCS-2. I prefer the latter name here; as it (like Latin-1) is restricted in character set rather than variable in length. But same thing. -- https://mail.python.org/mailman/listinfo/python-list