8 Şub 2019 Cum 11:57 tarihinde Michael Jones <michael.jo...@gmail.com> şunu yazdı:
> ...says that in one particular test condition (8 character strings, 1M > random strings, all possible shift values) > and under one particular metric of virtue... > > x = x<<15 ^ x>>33 > > ...gives the closest overall approximation to a random result. you might > try that in your application to see how well it works for you. that figure > of merit (44.608) is not particularly high but not terrible. A modification > of FNV1a can get to 22 (almost: 2.200000000000000355e+01) here. > Hm, so overlapping shifts help get better mixing. As a hash function gets better at mixing bits, it will almost surely have a collision (due to birthday paradox) with two inputs with lengths <= 4. Can we artificially guarantee a minimum sum of colliding (utf8 string) inputs that is > 8 ? I think such a hash needs to satisfy the following: 1) Number of valid utf8 strings with length <= 8 is less than (255^9 - 1)/254 which is less than 2^64 So if the hash maps these strings to distinct uint64 values, then minimum sum of colliding input lengths is at least 9. Can it be > 9 ? 2) Be non-trivial. For example, this hash satisfies 1): if len(input) < 8, pad with zeros up to total 8 bytes and return otherwise return first 8 bytes of input this also satisfies 1): Treat input as integer in base 255, return input mod 2^64 3) have equidistribution for overall inputs Instead of statistical methods, we can maybe brute search among parametric simple hash functions for one that satisfies these 3 properties? Or explicitly design one? > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.