> On 3 Nov 2019, at 21:45, Gary Gregory <[email protected]> wrote:
>
> I feel like I am missing something basic in the assumption of this issue:
> there is no such thing as an unsigned int in Java and the ticket talks
> about (C?) unsigned ints. Please help me understand how or why we should
> care about C vs. Java ints. Are we comparing apples to oranges here?
When a byte is converted to an int there is sign extension if it is negative. A
negative byte will have all bits above the 8th bit set to 1. So if the byte is
negative then when converted to an int for bit shift and xor operations the raw
bits are not the same.
These are not the same:
byte b = -1;
(int) b != (b & 0xff);
b << 8 != (b & 0xff) << 8;
b << 16 != (b & 0xff) << 16;
The original code has the use of the 0xff mask for most of the murmur3
algorithm. It has been missed from the final steps applied the the last 3 bytes
in the hash32 algorithm variant.
Alex
>
> Thank you,
> Gary
>
> On Sun, Nov 3, 2019, 07:59 Claude Warren <[email protected]> wrote:
>
>> There is an error in the current Murmur3 code introduced by sign extension
>> errors. This is documented in CODEC-264.[1]
>>
>> I have created a pull request to fix it.[2]
>>
>> While the code changes did not change any of the existing Murmur3 tests, I
>> did add new tests that failed until the changes were applied.
>>
>> However, I am concerned that the commons-codec Murmur3 code is in the wild
>> and this change will impact some hashes.
>>
>> Therefore, I am asking for a discussion concerning how to apply any patches
>> like CODEC-264 to hashing algorithms that are in the wild.
>>
>> Thx,
>> Claude
>>
>> [1] https://issues.apache.org/jira/browse/CODEC-264
>> [2] https://github.com/apache/commons-codec/pull/27
>>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]