On 06/04/2016 19:44, Emilio G. Cota wrote: > I like this idea, because the ugliness of the sizeof checks is significant. > However, the quality of the resulting hash is not as good when always using > func5. > For instance, when we'd otherwise use func3, two fifths of every input contain > exactly the same bits: all 0's. This inevitably leads to more collisions.
It doesn't necessarily lead to more collisions. For a completely stupid hash function "a+b+c+d+e", for example, adding zeros doesn't add more collisions. What if you rearrange the five words so that the "all 0" parts of the input are all at the beginning, or all at the end? Perhaps the problem is that the odd words at the end are hashed less effectively. Perhaps better is to always use a three-word xxhash, but pick the 64-bit version if any of phys_pc and pc are 64-bits. The unrolling would be very effective, and the performance penalty not too important (64-bit on 32-bit is very slow anyway). If you can fix the problems with the collisions, it also means that you get good performance running 32-bit guests on qemu-system-x86_64 or qemu-system-aarch64. So, if you can do it, it's a very desirable property. Paolo