What I suggested is far, far more conservative than anything else proposed
on this or on the dev thread.

Changing the hash implementation of sets and vectors etc. is likely to have
unintended consequences for performance -
http://www.scala-lang.org/old/node/8510.html

The original Scala code used case classes - like defrecord they provide a
lot of functionality out of the box and don't require defining hash /
equals methods. In the past Scala case classes had bad hash code
distribution, this is no longer the case.

What I'm suggesting is to leave everything else alone, but only make the
sensible change so that the Scala code can be reasonably ported to Clojure.
This simply requires giving defrecord a better hashing strategy. I suspect
we can duplicate the approach already taken by Scala for case classes.

David

On Fri, Oct 25, 2013 at 12:11 PM, Stuart Halloway <stuart.hallo...@gmail.com
> wrote:

> To save others from having to read this whole thread, is the following
> summary correct?
>
> 1. We believe that the performance issue Paul first encountered is
> substantially / entirely caused by hash collisions.
>
> 2. The hash collisions could be improved by spreading the hashes of
> numbers (and not needing to mess with hashes of collections).
>
> If the above summary is correct, then I have the following questions:
>
> a. Is the approach Scala took substantially #2 above, or something
> different / more?
>
> b. Is changing the hash codes of numbers going to break expectations on
> the Java side of things?
>
>
> On Thu, Oct 24, 2013 at 8:37 PM, Evan Gamble <solar.f...@gmail.com> wrote:
>
>> As Andy Fingerhut pointed out, since (hash [a b]) is 31*(a + 30) + (b +
>> 31) when a and b are ints, summing the hashes of 2-tuples when the values
>> are small ints, as happens when hashing sets of such 2-tuples, is quite
>> likely to lead to collisions. This particular problem could be avoided by
>> spreading the hashes of ints out to large numbers with a non-linear
>> function, not just multiplying them all by some N.
>>
>>
>> On Wednesday, October 23, 2013 5:41:15 PM UTC-7, puzzler wrote:
>>
>>> Perhaps I don't understand your point, but vector hashing is not
>>> currently the sum of hashes.  It's a more complex computation.
>>>
>>> => (hash [0 2])
>>> 963
>>> => (hash [2 0])
>>> 1023
>>> => (hash [2])
>>> 33
>>>
>>> The fact that numbers hash to themselves means that the resulting hashes
>>> are not hugely spread apart, but collisions don't seem to be as common as
>>> with sets and maps.  I'm sure it could be improved, but vector/sequence
>>> hashing doesn't strike me as a huge issue.
>>>
>>>
>>> On Wed, Oct 23, 2013 at 5:31 PM, Cedric Greevey <cgre...@gmail.com>wrote:
>>>
>>>> There's still a problem with collisions among *vectors* though.
>>>>
>>>> I'd say the real underlying problem is that small integers hash to
>>>> themselves. If you let N be a fairly large number satisfying GCD(N, 2^32 -
>>>> 1) = 1, then multiplying 32-bit integers by N would send nearby integers to
>>>> widely separated ones, while inducing a permutation on nonzero integers --
>>>> the hash space wouldn't be any smaller. That alone won't help much with the
>>>> vector problem, since 3N + 2N = 5N just as 3 + 2 = 5, though it may help
>>>> avoid collisions with small integers in small hash maps when the hash is
>>>> being used modulo a small hash-table length. What might help with vectors,
>>>> sets, and other small data structures containing integers (but would shrink
>>>> the hash space by about half) is to additionally square the result (all
>>>> operations being 2s-complement 32-bit integer operations with wrapping --
>>>> Java "int" arithmetic). Now 1 and 4 won't give the same summed hashes as 3
>>>> and 2, and 0 and 5 will be different again. You might also want to add some
>>>> constant to avoid 0 hashing to 0.
>>>>
>>>>  --
>> --
>> You received this message because you are subscribed to the Google
>> Groups "Clojure" group.
>> To post to this group, send email to clojure@googlegroups.com
>> Note that posts from new members are moderated - please be patient with
>> your first post.
>> To unsubscribe from this group, send email to
>> clojure+unsubscr...@googlegroups.com
>> For more options, visit this group at
>> http://groups.google.com/group/clojure?hl=en
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "Clojure" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to clojure+unsubscr...@googlegroups.com.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>  --
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to