> On 18 Jan 2020, at 16:09, Gary Gregory <garydgreg...@gmail.com> wrote:
>
> Any thoughts of the effect of this on the Commons Collections Bloom filter
> proposal?
>
It would have no effect. Claude’s Bloom filter code was what spotted the bug in
the MurmurHash3 implementations. We fixed hash32 and hash128 by creating new
methods. I presume the Bloom filter code is using the new implementations.
The real issue is that hash128 was meant to remain unchanged. However it was
pointed out by Andy Seaborne that the old hash128 implementation calls the new
hash128x64 implementation. So the old one is no longer broken. The idea was to
keep it broken. Unfortunately somewhere in the history of the changes I dropped
the cast of the int to a long and the old method called the wrong new version
with an int seed not a long seed. I cannot see where this happened in the git
history but the is code is like that with the last commits from me so I must
have done it. I see the result as:
- codec 1.14 hash128 works differently from codec 1.13 hash128 if you use the
methods and pass them a negative seed
- codec 1.14 marks all the hash128 methods as deprecated so there is a warning
there that something has changed
- unfortunately the warning states the hash was wrong and advises you to
update, where as it actually updates the old hash too so a user who reads the
deprecation comment is mislead
This fix restores the hash128 back to the broken version which may be important
for someone who started using it since 1.13 and wants to maintain the same
hashes. So they should revert back to 1.13 if they wish to do this and wait for
1.15 (or 1.14.1) to get upgrades to codec. But how do we announce this to users?
Do we:
- push out a quick 1.14.1 patch release
- send out an e-mail to the users mailing list advising to revert to 1.13 if
hash128 is key to their app
- just leave it and wait for complaints to roll in (if any ever do)
The safest option is a patch release which then goes through the announce
channels and this issue is highlighted as a regression.
I wondered about not doing a release based on my uses cases for a hash. This
would be to do a quick check to avoid having to do a more intensive check. In
this case if the hash is different I just end up having to do a bit more work
until all the new hashes trickle though replacing the old ones. This could be a
problem if you have millions of hashes. However there could be cases where the
hash is a more permanent part of the system and changing its functionality
would break the system. In this instance a user would have to track the bug to
the change in codec and have to revert to 1.13.
WDYT?
Alex
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org