On 2012-09-10, Chris Angelico <ros...@gmail.com> wrote: > On Tue, Sep 11, 2012 at 12:06 AM, Oscar Benjamin ><oscar.j.benja...@gmail.com> wrote: >> On 2012-09-10, Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info> wrote: >>> What interning buys you is that "s == t" is an O(1) pointer compare if >>> they are equal. But if s and t differ in the last character, __eq__ will >>> still inspect every character. There is no way to tell Python "all >>> strings are interned, if s is not t then s != t as well". >>> >> >> I thought that if *both* strings were interned then a pointer comparison >> could decide if they were unequal without needing to check the characters. >> >> Have I misunderstood how intern() works? > > In a language where _all_ strings are guaranteed to be interned (such > as Lua, I think), you do indeed gain this. Pointer inequality implies > string inequality. But when interning is optional (as in Python), you > cannot depend on that, unless there's some way of recognizing interned > strings. Of course, that may indeed be the case; a simple bit flag > "this string has been interned" would suffice, and if both strings are > interned AND their pointers differ, THEN you can be sure the strings > differ. > > I have no idea whether or not CPython version X.Y.Z does this. The > value of such an optimization really depends on how likely strings are > to be interned; for instance, if the compiler automatically interns > all the names of builtins, this could be quite beneficial. Otherwise, > probably not; most Python scripts don't bother interning anything. >
I haven't looked at the source but my understanding was precisely that there is an intern() bit and that not only the builtins module but all the literals in any byte-compiled module are interned. Oscar -- http://mail.python.org/mailman/listinfo/python-list