On Thu, 20 Sep 2007 04:02:03 +0000, Karthik Gurusamy wrote: > On Sep 19, 7:17 pm, Steven D'Aprano <[EMAIL PROTECTED] > cybersource.com.au> wrote: >> On Wed, 19 Sep 2007 20:58:03 +0000, Karthik Gurusamy wrote: >> > While it's easy to explain the behavior, I think the decision to dis- >> > allow mutable items as keys is a bit arbitrary. There is no need for >> > dict to recompute hash >> >> What??? >> >> Of course it does. How else can it look up the key? Because it >> (somehow) just recognizes that it has seen the key before? How? By >> magic? > > You answered it yourself later. For a mapping service, hash is just one > way to do things. What you need is for each item in the collection, a > unique key.
And does the mapping get access to that unique key (the key's key)? It can't keep a mapping of object:key, because if it could do that, it wouldn't need the key, it could just keep object:payload. Since the mapping can't know the key's key, it has to ask the key, and that's what the __hash__ method does. > How you go from the key to the value is not something a programmer needs > to know. You are correct. All you need to know is that in Python, you can't use lists and sets as keys directly. You only need to know about the details of the way dicts look up keys if you are writing your own class, and you aren't happy with Python's default hash for instance objects. It isn't a common task. > Your mind is set on thinking on hash alone and hence you don't see > beyond it. No, these issues exist regardless of the implementation of the mapping. Whether you use a hash table or a binary tree of some sort, or a linear linked list, or a database, or folders in a filing cabinet. The behaviour of mutable objects in a mapping is always problematic, regardless of the mapping implementation. >> > (first of all, a user doesn't even need to know if underneath >> > 'hashing' is used -- the service is just a mapping between one item >> > to another item). >> >> The user doesn't need to know the mechanism, but the dict does. Dicts >> are implemented as hash tables. I suppose they could be implemented as >> something else (what? linear lists? some sort of tree?) but the same >> considerations must be made: > > Oh yes. If the keys are all integers (or any set of items that can be > ordered), why not an avl. It has guaranteed O(log N) while a hash in > worst case is O(N). But both the best and average cases are O(1), which beats AVL trees by a lot. Since dicts are used for Python's internals, they are HIGHLY optimized and VERY fast. Their O(1) will beat the O(log N) of AVL trees easily. Hash tables can also use keys even if they can't be ordered: {1+2j: None} > Why you want to tie yourself to the drawbacks of one > datastructure? Understand your goal is not to provide a hash; but to > provide a mapping service. No, the goal of a good language designer is to provide the fastest, most lightweight, simplest, most flexible mapping as a built-in. Any old slow, heavyweight, complicated, inflexible mapping will not do the job. That goal is best provided by a hash table. If people want additional mappings as well, they can be added as libraries. If those libraries are very useful, they can be added to the standard library. If they are HUGELY useful, they will become built-ins. AVL trees are not as flexible or fast as hash tables, and even if they were, you would *still* need to resolve the difficulties that occur if the keys mutate. >> the dict must be able to find keys it has >> seen before. How is the dict supposed to recognise the key if the key >> has changed? >> >> > Since we know hashing is used, all that is needed is, a well-defined >> > way to construct a hash out of a mutable. "Given a sequence, how to >> > get a hash" is the problem. >> >> Nonsense. That's not the problem. The problem is how to get exactly the >> same hash when the sequence has changed. > > Yes, if you keep thinking hash is the only tool you got. Fine, let me re-word it in terms of an AVL. Suppose you have two lists in an AVL: L1 = [1, 2, 3] L2 = [4, 5, 6] M = avl((L1, True), (L2, False)) The tree M (for mapping) has L1 at the top of the tree, and L2 as the right node. But now, EVERY time ANY mutable object changes, Python has to check whether it is a key in EVERY avl, and if so, re-built the tree. Otherwise the tree can become corrupt because the AVL invariants are no longer true. (Consider what happens if we say L1[0] = 999. Now L1 > L2. If you don't reorder the avl, M[L2] cannot be reached except by an exhaustive search of every node. That means it is no longer an AVL tree, just an inordered tree. Might as well save memory and use a linear linked list.) Needless to say, this will slow down Python just a tad. Look, you can go back to the archives of 1993 when this was first discussed. Nothing has changed since then. Mutable keys are still problematic, regardless of the implementation, and the simplest solution is to simply prohibit them. http://www.python.org/search/hypermail/python-1993/0044.html If you want to use a mutable object as a key, by object identity rather than value, then the easy way is to wrap it in an instance: class Wrapper: # don't even need new-style classes pass # or even an __init__ key = Wrapper() key.payload = [1, 2, 3] D = {key: "value"} But of course you can't look up the dict by value, only by identity. But that's what you wanted. Another way is to use this class: class HashableList(list): def __hash__(self): return hash(tuple(self)) Have fun, and remember us when you're debugging your code, because you'll be doing a lot of it. -- Steven. -- http://mail.python.org/mailman/listinfo/python-list