As Andy Fingerhut pointed out, since (hash [a b]) is 31*(a + 30) + (b + 31) 
when a and b are ints, summing the hashes of 2-tuples when the values are 
small ints, as happens when hashing sets of such 2-tuples, is quite likely 
to lead to collisions. This particular problem could be avoided by 
spreading the hashes of ints out to large numbers with a non-linear 
function, not just multiplying them all by some N.

On Wednesday, October 23, 2013 5:41:15 PM UTC-7, puzzler wrote:
>
> Perhaps I don't understand your point, but vector hashing is not currently 
> the sum of hashes.  It's a more complex computation.
>
> => (hash [0 2])
> 963
> => (hash [2 0])
> 1023
> => (hash [2])
> 33
>
> The fact that numbers hash to themselves means that the resulting hashes 
> are not hugely spread apart, but collisions don't seem to be as common as 
> with sets and maps.  I'm sure it could be improved, but vector/sequence 
> hashing doesn't strike me as a huge issue.
>
>
> On Wed, Oct 23, 2013 at 5:31 PM, Cedric Greevey 
> <cgre...@gmail.com<javascript:>
> > wrote:
>
>> There's still a problem with collisions among *vectors* though.
>>
>> I'd say the real underlying problem is that small integers hash to 
>> themselves. If you let N be a fairly large number satisfying GCD(N, 2^32 - 
>> 1) = 1, then multiplying 32-bit integers by N would send nearby integers to 
>> widely separated ones, while inducing a permutation on nonzero integers -- 
>> the hash space wouldn't be any smaller. That alone won't help much with the 
>> vector problem, since 3N + 2N = 5N just as 3 + 2 = 5, though it may help 
>> avoid collisions with small integers in small hash maps when the hash is 
>> being used modulo a small hash-table length. What might help with vectors, 
>> sets, and other small data structures containing integers (but would shrink 
>> the hash space by about half) is to additionally square the result (all 
>> operations being 2s-complement 32-bit integer operations with wrapping -- 
>> Java "int" arithmetic). Now 1 and 4 won't give the same summed hashes as 3 
>> and 2, and 0 and 5 will be different again. You might also want to add some 
>> constant to avoid 0 hashing to 0.
>>
>>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to