Even though vectors aren't as prone to collision between similar-looking
vectors, I think you are right that the smallness of the numbers is a
problem:

Consider the following:

=> (count (set (for [x (range 200) y (range 200)] (hash [x y]))))
6369

This means that out of 40,000 common ordered pairs, there are only 6369
distinct hash values.  That's not good.

So yes, upon further thought, I think sequence hashing should be improved
as well.

I want to emphasize that I don't view this as merely a "theoretical"
problem.  I have noticed for over a year now, when profiling my Clojure
code, that a surprising amount of time was spent on dealing with hash
collisions.  I chalked it up as the natural behavior of hash tables which
will occasionally have collisions, and just learned to live with it, but
the more I play around with the existing hash function on data similar to
what I use (with small numbers in slightly varying structures), the more
I'm convinced that Clojure's weak hashing strategy explains the performance
issues I've had.  Would love to see the Clojure community tackle this
head-on; I think there's a lot of performance to be gained for many real
apps by doing this.

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to