On Tue, Feb 18, 2020 at 9:12 AM Alex Herbert <alex.d.herb...@gmail.com>
wrote:


> > My maths is rusty.  If A=0xF000ABCD as interpreted as an unsigned  and
> > B=0xF000ABCD but interpreted as a signed does (A mod N) = (B mod N) for
> all
> > positive values of N?  If so then you are correct and Signedness does not
> > matter and can be removed.  I thought that the statement was false.
> >
>
> Yes, you are correct. Addition and multiplication use the bits in the same
> way. Modulus does not.
>
> So are the Murmur3 hash functions actually unsigned? I thought the
> original c code returned unsigned 64-bit values (uint64_t).
>
> IIUC the purpose of the Signedness enum is to specify that a negative
> value returned from the hash function should be used as a numerical
> magnitude if then used in downstream computations since sign extension on
> negatives will set leading bits to 1s. I.e. using the negative as if it
> were unsigned (and all bits are random) is not valid. Perhaps you can give
> an example of how signed or unsigned would be handled differently.
>
>
The murmur hash is not a good example here as it returns the resulting
buffer as 2 signed longs.  The MD5 is probably a better example as the
messageDigest.digest() returns a byte[128].  The code then interprets that
as 2 signed longs.  A C implementation could interpret that as 2 unsigned
longs.  Additionally, there are hashes that return smaller buffers such as
with murmur2 which returns a 32bit unsigned int in the C code.  In java
that could be represented as a long as per the Integer.asUnsignedlong()
method or it could just be cast to a long and thus signed.

The values returned are then passed to floorMod( value, numberOfBits ) to
turn on the appropriate bit in the resulting bit vector.  As noted above
this will yield different values depending upon how the byte[] was
represented.

Claude

Reply via email to