Agreed, set hashing is definitely a bigger issue.

Also agree array indexing is is a pretty common use case, though if people 
care about performance for manipulating arrays of data with integer indexes 
then they should probably be looking at core.matrix rather than trying to 
simulate array programming with maps and vectors. Right tool for the right 
job, and all that.....

On Thursday, 24 October 2013 12:40:13 UTC+8, puzzler wrote:

> OK, I get that if lists have to match the Java hash code, then it's a 
> nonstarter to try to improve it.  However, Clojure makes a distinction 
> between hasheq and hashCode, one of which is used within Clojure and one of 
> which is used for Java interop, and I don't think they necessarily need to 
> match.
>
> I don't really consider this to be a contrived case.  It's pretty typical, 
> in my experience, for Clojure programmers to choose to represent a 2D array 
> as a map from [x y] to values.  That's often more practical to intialize 
> and work with than a vectors-in-vector layout.  So a map with keys ranging 
> from [0 0] to [199 199] would be what you'd get for a 200x200 array.
>
> Having 6-7 objects per bucket isn't the end of the world, but it means 
> that all your lookups are several times slower than they could be.
>
> All that said, I think that improving set hashing is a more pressing issue.
>
>
>
> On Wed, Oct 23, 2013 at 9:05 PM, Mikera <mike.r.an...@gmail.com<javascript:>
> > wrote:
>
>> On Thursday, 24 October 2013 09:12:21 UTC+8, puzzler wrote:
>>
>>> Even though vectors aren't as prone to collision between similar-looking 
>>> vectors, I think you are right that the smallness of the numbers is a 
>>> problem:
>>>
>>> Consider the following:
>>>
>>> => (count (set (for [x (range 200) y (range 200)] (hash [x y]))))
>>> 6369
>>>
>>> This means that out of 40,000 common ordered pairs, there are only 6369 
>>> distinct hash values.  That's not good.
>>>
>>> So yes, upon further thought, I think sequence hashing should be 
>>> improved as well.
>>>
>>> I want to emphasize that I don't view this as merely a "theoretical" 
>>> problem.  I have noticed for over a year now, when profiling my Clojure 
>>> code, that a surprising amount of time was spent on dealing with hash 
>>> collisions.  I chalked it up as the natural behavior of hash tables which 
>>> will occasionally have collisions, and just learned to live with it, but 
>>> the more I play around with the existing hash function on data similar to 
>>> what I use (with small numbers in slightly varying structures), the more 
>>> I'm convinced that Clojure's weak hashing strategy explains the performance 
>>> issues I've had.  Would love to see the Clojure community tackle this 
>>> head-on; I think there's a lot of performance to be gained for many real 
>>> apps by doing this.
>>>
>>
>> You can't change the hashCode of lists / vectors without breaking the 
>> contract for java.util.List (which PersistentList and PersistentVector both 
>> implement) - 
>> http://docs.oracle.com/javase/7/docs/api/java/util/List.html#hashCode()
>>
>> I don't think breaking this would be a good idea.
>>
>> The java.util.List hashCode isn't actually too bad.  It's a decent 
>> compromise between an efficient implementation and reasonably effective 
>> hashing for most real-world use cases.
>>
>> Sure you can construct artificial examples where you get some collisions, 
>> but that's true of any hash code function. But note that the maximum number 
>> of collisions is still pretty low:
>>
>> (apply max (vals (frequencies (for [x (range 200) y (range 200)] (hash [x 
>> y])))))
>> => 7
>>
>> That's actually the number you really care about, since it is the number 
>> of objects in the same hash bucket that drives the pathological O(n^2) 
>> behaviours.
>>  
>>
>> -- 
>> -- 
>> You received this message because you are subscribed to the Google
>> Groups "Clojure" group.
>> To post to this group, send email to clo...@googlegroups.com<javascript:>
>> Note that posts from new members are moderated - please be patient with 
>> your first post.
>> To unsubscribe from this group, send email to
>> clojure+u...@googlegroups.com <javascript:>
>> For more options, visit this group at
>> http://groups.google.com/group/clojure?hl=en
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "Clojure" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to clojure+u...@googlegroups.com <javascript:>.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to