As Andy Fingerhut pointed out, since (hash [a b]) is 31*(a + 30) + (b + 31) when a and b are ints, summing the hashes of 2-tuples when the values are small ints, as happens when hashing sets of such 2-tuples, is quite likely to lead to collisions. This particular problem could be avoided by spreading the hashes of ints out to large numbers with a non-linear function, not just multiplying them all by some N.
On Wednesday, October 23, 2013 5:41:15 PM UTC-7, puzzler wrote: > > Perhaps I don't understand your point, but vector hashing is not currently > the sum of hashes. It's a more complex computation. > > => (hash [0 2]) > 963 > => (hash [2 0]) > 1023 > => (hash [2]) > 33 > > The fact that numbers hash to themselves means that the resulting hashes > are not hugely spread apart, but collisions don't seem to be as common as > with sets and maps. I'm sure it could be improved, but vector/sequence > hashing doesn't strike me as a huge issue. > > > On Wed, Oct 23, 2013 at 5:31 PM, Cedric Greevey > <cgre...@gmail.com<javascript:> > > wrote: > >> There's still a problem with collisions among *vectors* though. >> >> I'd say the real underlying problem is that small integers hash to >> themselves. If you let N be a fairly large number satisfying GCD(N, 2^32 - >> 1) = 1, then multiplying 32-bit integers by N would send nearby integers to >> widely separated ones, while inducing a permutation on nonzero integers -- >> the hash space wouldn't be any smaller. That alone won't help much with the >> vector problem, since 3N + 2N = 5N just as 3 + 2 = 5, though it may help >> avoid collisions with small integers in small hash maps when the hash is >> being used modulo a small hash-table length. What might help with vectors, >> sets, and other small data structures containing integers (but would shrink >> the hash space by about half) is to additionally square the result (all >> operations being 2s-complement 32-bit integer operations with wrapping -- >> Java "int" arithmetic). Now 1 and 4 won't give the same summed hashes as 3 >> and 2, and 0 and 5 will be different again. You might also want to add some >> constant to avoid 0 hashing to 0. >> >> -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.