I think that's a good summary. The condensed version is that the results of both __hash__() and __cmp__() have to remain stable for dicts to work as one would expect. __cmp__ doesn't do that for lists, and it isn't defined by default for user objects.
John Roth
"Peter Maas" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED]
Peter Maas schrieb:There was a huge and sometimes heated debate about tuples, lists and dictionaries recently, and the mainstream opinion was that dictionary keys must not be mutable, so lists are not allowed as dictionary keys.
Warning, long posting (~ 100 lines)
The existence of lists and tuples and the impossibility to use lists as dictionary keys is a constant source of irritation for Python newcomers. Recent threads in c.l.py have revealed that even programmers like me using Python for some years now often don't have a clear picture. To put an end to this misery I tried to learn from the recent discussions and to explain to myself what's going on. My problem was that I didn't know exactly how a dictionary and a dictionary lookup works:
A dictionary maps key values to data values. Sketch of a dictionary lookup(value = dict[key]):
def lookup(dict, key): '''dictionary lookup is done in three steps: 1. A hash value of the key is computed using a hash function. Hash functions map a large set L to a (usually) smaller set S.
A hash function must satisfy: (*) for all e1, e2 in L: h(e1) != h(e2) => e1 != e2
A hash function should satisfy as well as possible: for all e1, e2 in L: h(e1) == h(e2) => e1 == e2
Violation of the latter condition means a hash collision, i.e. two or more elements of L have the same hash value. This should be highly unlikely for good hash functions. If not, hash lookup becomes as slow as sequential lookup.
2. The hash value addresses a location in dict.data which is supposed to be an array of bins. A bin contains a collision list of (key,value) pairs.
3. The collision list addressed by the hash value is searched sequentially until a pair is found with pair[0] == key. The return value of the lookup is then pair[1]. Equality (==) is defined by the function cmp(). The return value for equality is 0, for inequality some other value. ''' h = hash(key) # step 1 cl = dict.data[h] # step 2 for pair in cl: # step 3 if cmp(key, pair[0]) == 0: return pair[1] else: raise KeyError, "Key %s not found." % key
Thus a data type must be a valid input for hash() and cmp() to be usable as a dictionary key. hash() and cmp() must satisfy the condition (*) in lookup.__doc__ for this data type.
Is a list a suitable candidate for a dictionary key? hash(list) = id(list) and cmp(list1, list2) comparing id(list1) with id(list2) would satisfy condition (*). But this definition completely ignores list contents, causing big surprises for programmers:
- Different lists with the same content would be mapped to different dictionary values.
- Dictionary lookup with list literals would be impossible.
To avoid these surprises dictionary lookup would have to use list contents instead of list id. Consider the (hypothetical, not working) code:
>>> d = dict() >>> li = [1,2,3] >>> d[li] = "123" >>> d[[1,2,3]] "123"
Now assume li is changed (e.g. li[2] = 4). There are two options to handle this,
let the dictionary ignore changes
>>> d[li]
KeyError: [1,2,4] not found in dictionary. (even worse: a previously existing
[1,2,4] map is returned).
or let the dictionary follow changes
>>> d[[1,2,3]] MovedError: Please address future lookups to [1,2,4] :)
Both are pretty unsatisfactory. Conclusion: dictionary lookup with mutable types like lists is a source of unpleasant surprises for the programmer and therefore impossible in Python. This is the point where tuples come in. They have nearly the same interface as lists but cannot be changed once they have been created thereby avoiding the problems discussed for lists. Thus tuples can be used as dictionary keys.
What about instances of user defined classes?
User defined classes can be anything the programmer likes but yet their instances are usable as dictionary keys because there is a default: hash(object) = id(object) and cmp(object1, object2) compares id(object1) with id(object2). The same setting that has been discussed above for lists and has been found unsatisfactory. What is different here?
1. It is a default setting which can be adapted to each class by defining/ overriding the methods __hash__ and __cmp__.
2. For objects identity is more important than for lists.
That's it.
Please correct me if I'm wrong. If not, a poor programmer's soul has found peace eventually :)
--
-------------------------------------------------------------------
Peter Maas, M+R Infosysteme, D-52070 Aachen, Tel +49-241-93878-0
E-mail 'cGV0ZXIubWFhc0BtcGx1c3IuZGU=\n'.decode('base64')
-------------------------------------------------------------------
-- http://mail.python.org/mailman/listinfo/python-list