On Sep 19, 7:17 pm, Steven D'Aprano <[EMAIL PROTECTED] cybersource.com.au> wrote: > On Wed, 19 Sep 2007 20:58:03 +0000, Karthik Gurusamy wrote: > > While it's easy to explain the behavior, I think the decision to dis- > > allow mutable items as keys is a bit arbitrary. There is no need for > > dict to recompute hash > > What??? > > Of course it does. How else can it look up the key? Because it (somehow) > just recognizes that it has seen the key before? How? By magic?
You answered it yourself later. For a mapping service, hash is just one way to do things. What you need is for each item in the collection, a unique key. How you go from the key to the value is not something a programmer needs to know. Your mind is set on thinking on hash alone and hence you don't see beyond it. > > > (first of all, a user doesn't even need to know > > if underneath 'hashing' is used -- the service is just a mapping between > > one item to another item). > > The user doesn't need to know the mechanism, but the dict does. Dicts are > implemented as hash tables. I suppose they could be implemented as > something else (what? linear lists? some sort of tree?) but the same > considerations must be made: Oh yes. If the keys are all integers (or any set of items that can be ordered), why not an avl. It has guaranteed O(log N) while a hash in worst case is O(N). Why you want to tie yourself to the drawbacks of one datastructure? Understand your goal is not to provide a hash; but to provide a mapping service. the dict must be able to find keys it has > seen before. How is the dict supposed to recognise the key if the key has > changed? > > > Since we know hashing is used, all that is needed is, a well-defined way > > to construct a hash out of a mutable. "Given a sequence, how to get a > > hash" is the problem. > > Nonsense. That's not the problem. The problem is how to get exactly the > same hash when the sequence has changed. Yes, if you keep thinking hash is the only tool you got. > > In other words, if you have two mutable objects M1 and M2, then you > expect: > No. I don't expect. I expect the hash to be different. Why do you keep thinking it's the mappings responsibility to take care of a changing key. > hash(M1) == hash(M2) if and only if M1 and M2 are equal > hash(M1) != hash(M2) if M1 and M2 are unequal > > but also: > > if M1 mutates to become equal to M2, hash(M1) must remain the same while > still being different from hash(M2). > > That means that hash() now is a non-deterministic function. hash([1,2,3]) > will vary according to how the list [1,2,3] was constructed! > > Obviously something has to give, because not all of these things are > mutually compatible. > > > If later the given sequence is different, that's > > not the dict's problem. > > Data structures don't have problems. Programmers do. And language > designers with sense build languages that minimize the programmers > problems, not maximize them. Yes, here you talk about a different goal altogether. Here comes the 'arbitrary' part I mentioned. > > > So if the list changes, it will result in a different hash and we will > > get a hash-miss. I doubt this is in anyway less intuitive than dis- > > allowing mutable items as keys. > > The choices for the language designer are: > > (1) Invent some sort of magical non-deterministic hash function which > always does the Right Thing. Nope, just say if the new sequence is different, you don't find the item in the dict. > > (2) Allow the hash of mutable objects to change, which means you can use > mutable objects as keys in dicts but if you change them, you can no > longer find them in the dict. They'll still be there, using up memory, > but you can't get to them. In the new model, at the time of addition, you need to remember the key at that time. If it's a list, you make a copy of the items. > > (3) Simply disallow mutable objects as keys. > > Alternative 1 is impossible, as we've seen, because the requirements for > the Right Thing are not mutually compatible. > > Alternative (2) leads to hard-to-find, hard-to-diagnose bugs where you > store objects in a dict only for them to mysteriously disappear later. > Worse, it could lead to bugs like the following hypothetical: Of course they can be reached with.. for k in dict... > > >>> M = [1, 2, 3] > >>> D = {M: 'parrot'} # pretend this works > >>> D > > {[1, 2, 3]: 'parrot'}>>> M.append(4) > >>> D > > {[1, 2, 3, 4]: 'parrot'}>>> D[[1, 2, 3, 4]] No, in the new way, the key still remains [1, 2, 3] What was changed is M. Not the key given to dict at the time of addition. Again I'm not describing today's behavior; it's in the new way. > > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > KeyError: [1, 2, 3, 4] > > Try explaining that one to programmers: they can SEE the key in the dict > when they print it, but they can't get it or delete it because the hash > has changed. No they don't. They see the key at the time of addition ([1,2,3]) > > Alternative 3 is easy to deal with: simply don't use mutable objects as > keys. That's what Python does. Sure, the programmer sometimes needs to > work around the lack (convert the list into a tuple, or a string, or > pickle it, whatever...) which on rare occasions is hard to do, but at > least there are no mysterious, hard to track down bugs. When I first saw key's must'be be mutable, it did appear to me to be mysterious. There was unnecessary tighter coupling between implementation details and the service exposed to the programmer. (As I see it, the root cause of all this is, the dict does not make a copy of the key at the time of item addition, it just makes a new reference to the same object) Karthik > > -- > Steven. -- http://mail.python.org/mailman/listinfo/python-list