8 Şub 2019 Cum 11:57 tarihinde Michael Jones <michael.jo...@gmail.com> şunu
yazdı:

> ...says that in one particular test condition (8 character strings, 1M
> random strings, all possible shift values)
> and under one particular metric of virtue...
>
> x = x<<15 ^ x>>33
>
> ...gives the closest overall approximation to a random result. you might
> try that in your application to see how well it works for you. that figure
> of merit (44.608) is not particularly high but not terrible. A modification
> of FNV1a can get to 22 (almost: 2.200000000000000355e+01) here.
>

Hm, so overlapping shifts help get better mixing. As a hash function gets
better at mixing bits, it will almost surely have a collision (due to
birthday paradox) with two inputs with lengths <= 4.

Can we artificially guarantee a minimum sum of colliding (utf8 string)
inputs that is > 8 ?

I think such a hash needs to satisfy the following:

1) Number of valid utf8 strings with length <= 8 is less than (255^9 -
1)/254 which is less than 2^64
So if the hash maps these strings to distinct uint64 values, then minimum
sum of colliding input lengths is at least 9. Can it be > 9 ?

2) Be non-trivial. For example, this hash satisfies 1):
if len(input) < 8, pad with zeros up to total 8 bytes and return
otherwise return first 8 bytes of input

this also satisfies 1):
Treat input as integer in base 255, return input mod 2^64

3) have equidistribution for overall inputs

Instead of statistical methods, we can maybe brute search among parametric
simple hash functions for one that satisfies these 3 properties? Or
explicitly design one?

>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to