With the upcoming change the StaticHash usage model has changed. It was serving two purposes:
1. as a mechanism to preserve the list of integers from the BloomFilter as well as the shape. 2. as a way to construct a Hasher from a collection of integers and a shape so that they could be merged into a Bloom filter without the overhead of constructing a temporary Bloom filter to use in the merge. The first purpose is removed by the removal of getHasher and addition of iterator() in the BloomFilter interface. I think we need a Hasher that accepts a shape and a collection of Integers or a function producing an iterator. Something like: {code:java} public static class CollectionHasher implements Hasher { Shape shape; Supplier<PrimitiveIterator.OfInt> func; CollectionHasher( Supplier<PrimitiveIterator.OfInt> func, Shape shape) { this.shape = shape; this.func = func; } CollectionHasher( Collection<Integer> collection, Shape shape) { this.shape = shape; this.func = new Supplier<PrimitiveIterator.OfInt>() { Collection<Integer> coll = collection; @Override public OfInt get() { return new PrimitiveIterator.OfInt() { Iterator<Integer> iter = coll.iterator(); @Override public boolean hasNext() { return iter.hasNext(); } @Override public int nextInt() { return iter.next().intValue(); } @Override public Integer next() { return iter.next(); } }; }}; } @Override public OfInt getBits(Shape shape) { if (!this.shape.equals(shape)) { throw new IllegalArgumentException(String.format("Hasher shape (%s) is not the same as shape (%s)", this.shape.toString(), shape.toString())); } return func.get(); } @Override public HashFunctionIdentity getHashFunctionIdentity() { return shape.getHashFunctionIdentity(); } @Override public boolean isEmpty() { return !func.get().hasNext(); } } {code} On Sun, Mar 8, 2020 at 12:39 AM Alex Herbert <alex.d.herb...@gmail.com> wrote: > > > On 6 Mar 2020, at 02:14, Alex Herbert <alex.d.herb...@gmail.com> wrote: > > > > The change to make the CountingBloomFilter an interface is in this PR > [1]. > > Claude has stated in a review of the PR on GitHub that the change to > CountingBloomFilter as an interface is good. > > I will now progress to updating the BloomFilter interface as previously > discussed and put that into a PR. Changes would be: > > - boolean return values from the merge operations. > - remove getHasher() and switch to providing an iterator of enabled indexes > > As per below: > > *public* *interface* BloomFilter { > > > > *int* andCardinality(BloomFilter other); > > > > *int* cardinality(); > > > > *boolean* contains(BloomFilter other); > > > > *boolean* contains(Hasher hasher); > > > > *long*[] getBits(); > > > > // Change > > PrimitiveIterator.OfInt iterator(); > > > > Shape getShape(); > > > > > > * // Change boolean* merge(BloomFilter other); > > > > > > *// Change boolean* merge(Hasher hasher); > > > > *int* orCardinality(BloomFilter other); > > > > *int* xorCardinality(BloomFilter other); > > > > } > > Given the CountingBloomFilter provides a forEach(BitCountConsumer) method > it may be useful to also have the following method to receive all the > enabled indexes: > > forEach(IntConsumer) > > Thus you can use the iterator of indexes for fail-fast checking against > each index, or use the forEach method when you know you want to process all > the bit indexes. In many cases the forEach can be more efficiently > implemented than an iterator and would avoid an iterator object creation. > > > > > > > > [1] https://github.com/apache/commons-collections/pull/137 > > <https://github.com/apache/commons-collections/pull/137> > > -- I like: Like Like - The likeliest place on the web <http://like-like.xenei.com> LinkedIn: http://www.linkedin.com/in/claudewarren