I was going to modify the BloomFilter interface as discussed on this thread.

However before that I have made some changes to the current code to tidy it up. 
PR is [1]. The changes target the following:

1. Test coverage.

Coverage is now close to 100%. The AbstractBloomFilterTest was modified to test 
merge and cardinality with a filter that was not the filter under test.

The HasherBloomFilter cannot achieve 100%. This is because I’ve added assert 
statements and JaCoCo doesn’t ignore them. The asserts are there because the 
code can assume that a non-empty StaticHasher contains at least 1 bit index. If 
we really can assume this then the asserts can be removed.


2. toString

I’ve removed BitSetBloomFilter toString(). A very big filter could cause an 
overflow of a char[] used to represent a string.


3. Improve the Shape class

This is a big change, mainly for all the extra javadoc.

The input desired probability should be tested for NaN so I added that.

All the exception messages are the same in all the constructors and they are 
documented.

Made Shape final. There are computations in there that do not use the getter 
methods to get the Shape values. So the class hasn’t been designed for 
extension.

If we make it final then the rest of the code can be happy that a Shape will 
function as expected, i.e. a user cannot provide their own shape. Q. Is this 
desirable or not?

If not then I can change it and we properly update it to be designed for 
extension.


4. Fix Shape.hashCode

Calling Objects.hash(HashFunctionIdentity, …) would not work as the 
HashFunctionIdentity is an interface with no hashCode implementation. So we 
have to generate the hash using the same fields that are used to test equality 
of HashFunctionIdentity objects. This is done in the HashFunctionValidator.


Q. Why does the shape precompute the hashCode? A shape is unlikely to be used 
as a key in a Map and so caching the hash seems like extra work for something 
that won’t be used. I suggest we drop the cache and compute the hashCode 
dynamically.

Alex


[1] https://github.com/apache/commons-collections/pull/140 
<https://github.com/apache/commons-collections/pull/140>


Reply via email to