On Mar 7, 2009, at 8:30 PM, Mark Engelberg wrote:

>
> On Sat, Mar 7, 2009 at 2:57 PM, Rich Hickey <richhic...@gmail.com>  
> wrote:
>> Identity is tested first in equality, if identical, equal, full stop.
>> So if you are using identical and unique collections as keys you'll
>> find them without a value-by-value comparison. If they are not
>> present, it's unlikely you'll get a matching hashCode and a matching
>> count and a matching long prefix from a mismatch.
>
> But if the key is a long list, wouldn't it have to traverse the list
> just to come up with the hash code for that list?  And doesn't it need
> that hash code just to know which "bin" to look in for matching keys?
> If so, then the performance hit occurs before you even get to
> comparing against other keys.

The core collection classes cache their hash codes. There is an issue  
to have them create them incrementally, amortizing the cost over all  
insertions, which will work for all but lists, which have a front-to- 
back hash algorithm but are built back-to-front.

http://code.google.com/p/clojure/issues/detail?id=11

>
> My understanding is that eq?-based associative collections use a
> completely different hashing scheme, hashing on the address or some
> other unique/stable value associated with each object, and then use
> eq? (like identical?) to do the actual comparison when they get to the
> right "bin".
>

Yes, so? It's still not an argument for another collection family,  
which will only engender confusion as people make similar incorrect  
assumptions about performance and equality.

> Another use case:  Consider sorting a collection of lengthy lists, and
> you want to sort based on the length of the lists.  You want to cache
> the length of those lists.  Using an identity-based hash-map would be
> the right kind of cache for such a thing, because you're looking up
> the exact same lists and want to be efficient about it (assuming my
> comments above about hash performance is correct).
>

This argument only applies if you never want to calculate the hash  
code by a walk, even once. I think that's a false economy, and again,  
not persuasive. With incremental hashcode calculation, even less so.

Clojure has other data structures that address some of the  
deficiencies of lists, but even PersistentList is Counted, i.e.  
provides its count in constant time, as do maps, sets and vectors,  
array, vector and string seqs etc.

Rich


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to 
clojure+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to