> On 18 Jan 2020, at 16:09, Gary Gregory <garydgreg...@gmail.com> wrote:
> 
> Any thoughts of the effect of this on the Commons Collections Bloom filter
> proposal?
> 

It would have no effect. Claude’s Bloom filter code was what spotted the bug in 
the MurmurHash3 implementations. We fixed hash32 and hash128 by creating new 
methods. I presume the Bloom filter code is using the new implementations.

The real issue is that hash128 was meant to remain unchanged. However it was 
pointed out by Andy Seaborne that the old hash128 implementation calls the new 
hash128x64 implementation. So the old one is no longer broken. The idea was to 
keep it broken. Unfortunately somewhere in the history of the changes I dropped 
the cast of the int to a long and the old method called the wrong new version 
with an int seed not a long seed. I cannot see where this happened in the git 
history but the is code is like that with the last commits from me so I must 
have done it. I see the result as:

- codec 1.14 hash128 works differently from codec 1.13 hash128 if you use the 
methods and pass them a negative seed
- codec 1.14 marks all the hash128 methods as deprecated so there is a warning 
there that something has changed
- unfortunately the warning states the hash was wrong and advises you to 
update, where as it actually updates the old hash too so a user who reads the 
deprecation comment is mislead

This fix restores the hash128 back to the broken version which may be important 
for someone who started using it since 1.13 and wants to maintain the same 
hashes. So they should revert back to 1.13 if they wish to do this and wait for 
1.15 (or 1.14.1) to get upgrades to codec. But how do we announce this to users?

Do we:

- push out a quick 1.14.1 patch release
- send out an e-mail to the users mailing list advising to revert to 1.13 if 
hash128 is key to their app
- just leave it and wait for complaints to roll in (if any ever do)

The safest option is a patch release which then goes through the announce 
channels and this issue is highlighted as a regression.

I wondered about not doing a release based on my uses cases for a hash. This 
would be to do a quick check to avoid having to do a more intensive check. In 
this case if the hash is different I just end up having to do a bit more work 
until all the new hashes trickle though replacing the old ones. This could be a 
problem if you have millions of hashes. However there could be cases where the 
hash is a more permanent part of the system and changing its functionality 
would break the system. In this instance a user would have to track the bug to 
the change in codec and have to revert to 1.13. 
 
WDYT?

Alex


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to