To save others from having to read this whole thread, is the following
summary correct?

1. We believe that the performance issue Paul first encountered is
substantially / entirely caused by hash collisions.

2. The hash collisions could be improved by spreading the hashes of numbers
(and not needing to mess with hashes of collections).

If the above summary is correct, then I have the following questions:

a. Is the approach Scala took substantially #2 above, or something
different / more?

b. Is changing the hash codes of numbers going to break expectations on the
Java side of things?


On Thu, Oct 24, 2013 at 8:37 PM, Evan Gamble <solar.f...@gmail.com> wrote:

> As Andy Fingerhut pointed out, since (hash [a b]) is 31*(a + 30) + (b +
> 31) when a and b are ints, summing the hashes of 2-tuples when the values
> are small ints, as happens when hashing sets of such 2-tuples, is quite
> likely to lead to collisions. This particular problem could be avoided by
> spreading the hashes of ints out to large numbers with a non-linear
> function, not just multiplying them all by some N.
>
>
> On Wednesday, October 23, 2013 5:41:15 PM UTC-7, puzzler wrote:
>
>> Perhaps I don't understand your point, but vector hashing is not
>> currently the sum of hashes.  It's a more complex computation.
>>
>> => (hash [0 2])
>> 963
>> => (hash [2 0])
>> 1023
>> => (hash [2])
>> 33
>>
>> The fact that numbers hash to themselves means that the resulting hashes
>> are not hugely spread apart, but collisions don't seem to be as common as
>> with sets and maps.  I'm sure it could be improved, but vector/sequence
>> hashing doesn't strike me as a huge issue.
>>
>>
>> On Wed, Oct 23, 2013 at 5:31 PM, Cedric Greevey <cgre...@gmail.com>wrote:
>>
>>> There's still a problem with collisions among *vectors* though.
>>>
>>> I'd say the real underlying problem is that small integers hash to
>>> themselves. If you let N be a fairly large number satisfying GCD(N, 2^32 -
>>> 1) = 1, then multiplying 32-bit integers by N would send nearby integers to
>>> widely separated ones, while inducing a permutation on nonzero integers --
>>> the hash space wouldn't be any smaller. That alone won't help much with the
>>> vector problem, since 3N + 2N = 5N just as 3 + 2 = 5, though it may help
>>> avoid collisions with small integers in small hash maps when the hash is
>>> being used modulo a small hash-table length. What might help with vectors,
>>> sets, and other small data structures containing integers (but would shrink
>>> the hash space by about half) is to additionally square the result (all
>>> operations being 2s-complement 32-bit integer operations with wrapping --
>>> Java "int" arithmetic). Now 1 and 4 won't give the same summed hashes as 3
>>> and 2, and 0 and 5 will be different again. You might also want to add some
>>> constant to avoid 0 hashing to 0.
>>>
>>>  --
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to