On Tue, Jan 24, 2012 at 3:18 PM, Terry Reedy <tjre...@udel.edu> wrote: > I think that the devs decided that interning is a minor internal > optimization that users generally should not fiddle with (especially how > that so much is done automatically anyway*), while having it a builtin made > it look like something they should pay attention to. > > *I am not sure but what hashes for strings either are or in 3.3 will always > be cached.
I'm of the opinion that hash() shouldn't be relied upon, but apparently there's code "out there" that would be broken if hash() changed (and, quite reasonably, the devs don't want to make a sudden change as a bug-fix release). String interning basically turns every string into a completely opaque hash; you can use 'is' to test for equality of two interned strings. Having intern() as a builtin cannot encourage any worse behavior than relying on hash(), imho - both make no promises of constancy across runs. Lua and Pike both quite happily solved hash collision attacks in their interning of strings by randomizing the hash used, because there's no way to rely on it. Presumably (based on the intern() docs) Python can do the same, if you explicitly intern your strings first. Is it worth recommending that people do this with anything that is client-provided, and then simply randomize the intern() hash? This would allow hash() to be unchanged, intern() to still do exactly what it's always done, and hash collision attacks to be eliminated. ChrisA -- http://mail.python.org/mailman/listinfo/python-list