On Mon, Sep 8, 2014 at 1:40 AM, Roy Smith <r...@panix.com> wrote: > Well, technically, what you store is something which has the right > behavior. If I wrote: > > my_huffman_coded_list = [0] * 1000000 > > I don't know of anything which requires Python to actually generate a > million 0's and store them somewhere (even ignoring interning of > integers). As long as it generated an object (perhaps a subclass of > list) which responded to all of list's methods the same way a real list > would, it could certainly build a more compact representation.
Steven hinted at it, but I'll say one thing more explicitly here: There's actually something that requires Python to *not* generate a million 0 integers. What you get is a million references to the *same* zero. >>> another_list = [object()] * 1000000 >>> sum(id(x) for x in another_list) 140287290433648000000 >>> id(another_list[0]) * len(another_list) 140287290433648000000 The two figures are guaranteed to be the same, these are all the same object. But what you're talking about here is an alternative encoding. And it's definitely possible for different Pythons to encode strings differently; uPy uses UTF-8 internally, which gives different performance metrics, but guarantees the same semantics; I could imagine someone implementing a Python interpreter in Pike, and using the Pike string type to store Python strings (the semantics will all be correct, as it's a Unicode string; the most notable difference is that Pike strings are guaranteed to be interned, so all equality comparisons are identity checks); if you wanted to, I'm sure you could build a Python that uses a dictionary of words (added to every time you create a string, of course), and actually represents entire words as short integers, which would mean individual characters aren't necessarily represented directly. But somehow, you have to turn the concept of "sequence of Unicode characters" into some well-defined sequence of bytes, and that's an encoding. ChrisA -- https://mail.python.org/mailman/listinfo/python-list