On Tue, Feb 18, 2020 at 9:12 AM Alex Herbert <alex.d.herb...@gmail.com> wrote:
> > My maths is rusty. If A=0xF000ABCD as interpreted as an unsigned and > > B=0xF000ABCD but interpreted as a signed does (A mod N) = (B mod N) for > all > > positive values of N? If so then you are correct and Signedness does not > > matter and can be removed. I thought that the statement was false. > > > > Yes, you are correct. Addition and multiplication use the bits in the same > way. Modulus does not. > > So are the Murmur3 hash functions actually unsigned? I thought the > original c code returned unsigned 64-bit values (uint64_t). > > IIUC the purpose of the Signedness enum is to specify that a negative > value returned from the hash function should be used as a numerical > magnitude if then used in downstream computations since sign extension on > negatives will set leading bits to 1s. I.e. using the negative as if it > were unsigned (and all bits are random) is not valid. Perhaps you can give > an example of how signed or unsigned would be handled differently. > > The murmur hash is not a good example here as it returns the resulting buffer as 2 signed longs. The MD5 is probably a better example as the messageDigest.digest() returns a byte[128]. The code then interprets that as 2 signed longs. A C implementation could interpret that as 2 unsigned longs. Additionally, there are hashes that return smaller buffers such as with murmur2 which returns a 32bit unsigned int in the C code. In java that could be represented as a long as per the Integer.asUnsignedlong() method or it could just be cast to a long and thus signed. The values returned are then passed to floorMod( value, numberOfBits ) to turn on the appropriate bit in the resulting bit vector. As noted above this will yield different values depending upon how the byte[] was represented. Claude