OK, I think I need to recap a bit before starting my reply (for my own benefit, even if nobody else's). (The rest of this post will also include a fair bit of repeating things that have already been said elsewhere in the thread)

====
The actual rule dictionaries use when deciding whether or not to allow a key is simply 'can I hash it?'. Lists don't provide a hash, and tuples only provide a hash if every item they contain provides a hash.


For the dictionary to behave sanely, two keys which compare equal must also hash equal. Otherwise, (key1 == key2) == (d[key1] is d[key2]) may not hold.

So the one absolute rule on hashes is that (x == y) must imply that (hash(x) == hash(y)). In other words, the hash operation must not involve any object properties that are not also involved in comparison for equality.

The rule recommended for hashes is that a mutable object only define a hash if both the comparison operation and the hash are based on immutable portions of the object (such as it's ID). (The language reference currently recommends something stricter - it suggests that you don't define hash at all if you define a custom comparison method on a mutable object. That's overly strict, though)

Now, and here's the key point: Python's builtin objects are written in accordance with this recommendation, which means the mutable objects like dict and list don't define __hash__ at all.

Antoon Pardon wrote:
that it would be handy to use such an object as a key and he has some
assurance that those objects are stable while in use as a key, I see
nothing wrong with using such mutable objects as keys.

Scarily enough, you're starting to persuade me that the idea isn't *completely* insane. Maybe merely mostly insane ;)


The interesting thing is that allowing mutable objects has keys doesn't require *any* changes to dict. It only requires changes to the mutable objects (e.g. having list adopt tuple's hash method).

Anyway, what are the consequences of a type which has a 'variable hash'?

The only real bad consequence is that the internal state of a dictionary can break if someone alters the hash of one of its keys. The restriction to 'constant hashes' is aimed squarely at eliminating this class of programming errors. It doesn't actually provide any performance gain.

A performance loss would only occur if the dictionary tried to be helpful in detecting when a key mutates - and I see no reason why it should.

If you are saying you are happy with "undefined behavior" as the outcome of 
mutating keys, ok,
but if your implementation lets mutated-key errors pass silently, good luck 
debugging ;-)

But that problem is not limited to dictionaries. If you make a heapqueue
and mutate an element in it, such errors will pass silently too and will be just as hard to debug. Or suppose you create a class that
keeps a sorted list of a number of objects and the programmer accidently
mutates an object in it. That will probably pass silently too and
give you night mares debugging.

The longer I consider it, the more this seems like a valid analogy. There is nothing preventing dictionaries from having a rehash() method.


Consider:
# Mutate some value in mylist
mylist.sort()

# Mutate some key in mydict
mydict.rehash()

Rehash would work exactly as Antoon suggested - scan all the hash buckets, and any key which hashes incorrectly gets moved to the right place. Forgetting to rehash() would have consequences no worse than forgetting to sort().

The problem is also that you really can't make new immutable objects
AFAIK. I have once made a vector class that didn't have any mutating
method and I always used it as if it as immutable but because of
the "we are all consenting adults here" attitude of python I can't
assure that other people of my class won't directly change a vector.

Do you find I shouldn't use these vectors as keys?

This is a fair point, actually:

Py> from decimal import Decimal
Py> x = Decimal()
Py> hash(x)
0
Py> x._int = 1,
Py> hash(x)
1


But mutable objects is fraught with pitfall anyhow. Even mutating an
element in a list you are iterating over can give unexpected results
or just assigning a mutable to other names a few times and then mutating
it through one name and finding out an other name that was supposed to
stay stable, mutated too. Why else the warning against the use of
a default value of [] for an argument?

Another fair point, although I'm not sure that pointing out that mutable values already give you plenty of ways to shoot yourself in the foot is a point in *favour* of your idea :)


A last idea, but this definetly is only usefull for debugging, is
having a dictionary that keeps a copy of the key, with the ability
to check the keys againt the copy, this might help in debugging.

I have a different suggestion: an identity dictionary.

It ignores __hash__, __cmp__ and __eq__, and instead uses id() and is.

Or even something like:

override_dict(id, lambda x, y: x is y)(<normal dict contructor arguments go 
here>)

Personnaly I find the atmosphere in general here very defensive when
some kind of feature is questioned. There seems a definite "Love it
or leave it" attitude to be present.

Generally depends on the feature, but yeah, I've seen that happen (and probably been guilty of it, too)


I have usenet experience enough to know that newsgroups can vary wildly
and that you should evaluate each on its own merits. I do agree c.l.py
is very helpfull in case you need help in how to solve a particular
problem in the language. That niceness can disappear rather swiftly
if the pros and cons of python itself get discussed.

Usually because the ideas are ill-formed and poorly thought out. Unfortunately, the attitude that spawns can bleed over into ideas which aren't without merit, but aren't always expressed clearly.


I'm currently not under the impression I'm able to. Sure I could
implement the above mentioned classes, but my feeling is that
should I present them here, nobody would be waiting for them.
Not because they would be unusfull, but because they go against
the current paradigm.

As I see it, you have a couple of courses available:

If you're into pain, consider further developing the 'rehash' idea, and the option of adding hash methods to the mutable builtins. You've persuaded me that the idea is not entirely unreasonable. However, I mention pain, because you'd still have work to do persuading enough people (including me) that the additional hard to detect bugs this adds to the language would be worth it.

I don't really recommend that option.

A potentially more fruitful course is the dictionary variants identity_dict or override_dict. The current dict() implementation forces the use of hash() for sorting into hash buckets and "x == y" for key matching. An identity_dict variant that uses id() (or object.__hash__ in Jython) and "x is y" would seem to quite happily meet the use cases you have posted in this thread, without silencing currently detected bugs.

override_dict is probably just a case of compulsive overgeneralisation that can be ignored for the time being :)

Cheers,
Nick.

--
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---------------------------------------------------------------
            http://boredomandlaziness.skystorm.net
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to