I think that's a good summary. The condensed
version is that the results of both __hash__() and
__cmp__() have to remain stable for dicts to work
as one would expect. __cmp__ doesn't do that
for lists, and it isn't defined by default for user
objects.

John Roth

"Peter Maas" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED]
Peter Maas schrieb:
There was a huge and sometimes heated debate about tuples, lists and
dictionaries recently, and the mainstream opinion was that dictionary
keys must not be mutable, so lists are not allowed as dictionary keys.

Warning, long posting (~ 100 lines)

The existence of lists and tuples and the impossibility to use
lists as dictionary keys is a constant source of irritation for
Python newcomers. Recent threads in c.l.py have revealed that
even programmers like me using Python for some years now often don't
have a clear picture. To put an end to this misery I tried to learn
from the recent discussions and to explain to myself what's going on.
My problem was that I didn't know exactly how a dictionary and
a dictionary lookup works:

A dictionary maps key values to data values. Sketch of a dictionary
lookup(value = dict[key]):

def lookup(dict, key):
    '''dictionary lookup is done in three steps:
       1. A hash value of the key is computed using a hash function.
          Hash functions map a large set L to a (usually) smaller
          set S.

          A hash function must satisfy:
          (*) for all e1, e2 in L: h(e1) != h(e2) => e1 != e2

          A hash function should satisfy as well as possible:
              for all e1, e2 in L: h(e1) == h(e2) => e1 == e2

          Violation of the latter condition means a hash collision,
          i.e. two or more elements of L have the same hash value.
          This should be highly unlikely for good hash functions.
          If not, hash lookup becomes as slow as sequential lookup.

       2. The hash value addresses a location in dict.data which is
          supposed to be an array of bins. A bin contains a collision
          list of (key,value) pairs.

       3. The collision list addressed by the hash value is searched
          sequentially until a pair is found with pair[0] == key. The
          return value of the lookup is then pair[1]. Equality (==) is
          defined by the function cmp(). The return value for equality is
          0, for inequality some other value.
    '''
    h = hash(key)                  # step 1
    cl = dict.data[h]              # step 2
    for pair in cl:                # step 3
        if cmp(key, pair[0]) == 0:
            return pair[1]
    else:
        raise KeyError, "Key %s not found." % key

Thus a data type must be a valid input for hash() and cmp() to be usable
as a dictionary key. hash() and cmp() must satisfy the condition (*) in
lookup.__doc__ for this data type.

Is a list a suitable candidate for a dictionary key? hash(list) = id(list)
and cmp(list1, list2) comparing id(list1) with id(list2) would satisfy
condition (*). But this definition completely ignores list contents,
causing big surprises for programmers:

- Different lists with the same content would be mapped to different
  dictionary values.

- Dictionary lookup with list literals would be impossible.

To avoid these surprises dictionary lookup would have to use list contents
instead of list id. Consider the (hypothetical, not working) code:

>>> d = dict()
>>> li = [1,2,3]
>>> d[li] = "123"
>>> d[[1,2,3]]
"123"

Now assume li is changed (e.g. li[2] = 4). There are two options to handle this,
let the dictionary ignore changes


>>> d[li]
KeyError: [1,2,4] not found in dictionary. (even worse: a previously existing
[1,2,4] map is returned).


or let the dictionary follow changes

>>> d[[1,2,3]]
MovedError: Please address future lookups to [1,2,4] :)

Both are pretty unsatisfactory. Conclusion: dictionary lookup with
mutable types like lists is a source of unpleasant surprises for the
programmer and therefore impossible in Python. This is the point where
tuples come in. They have nearly the same interface as lists but cannot
be changed once they have been created thereby avoiding the problems
discussed for lists. Thus tuples can be used as dictionary keys.

What about instances of user defined classes?

User defined classes can be anything the programmer likes but yet their
instances are usable as dictionary keys because there is a default:
hash(object) = id(object) and cmp(object1, object2) compares id(object1)
with id(object2). The same setting that has been discussed above for lists
and has been found unsatisfactory. What is different here?

1. It is a default setting which can be adapted to each class by defining/
   overriding the methods __hash__ and __cmp__.

2. For objects identity is more important than for lists.

That's it.

Please correct me if I'm wrong. If not, a poor programmer's soul has
found peace eventually :)

--
-------------------------------------------------------------------
Peter Maas, M+R Infosysteme, D-52070 Aachen, Tel +49-241-93878-0
E-mail 'cGV0ZXIubWFhc0BtcGx1c3IuZGU=\n'.decode('base64')
-------------------------------------------------------------------

-- http://mail.python.org/mailman/listinfo/python-list

Reply via email to