I do not think concurrent write usage is a general use case for the filter. So this seems like a specialisation. If implementing it would harm performance then I would be against it for the base implementation.
For specialisation I would prefer the JDK pattern used for Synchronized collections which have a class to wrap the underlying implementation, e.g. BloomFilters: public static BloomFilter synchronizedBloomFilter(BloomFilter) public static CountingBloomFilter synchronizedCountingBloomFilter(CountingBloomFilter) You can then make any implementation thread safe. If your suggested solution would be to use a lock wrapping the entire method body then the effect of a synchronized call wrapping the implementation is the same. This would satisfy the use case of building a single filter using multiple threads. Note however that the same can be achieved by building multiple filters, 1 per thread, and then merging them all. Note that it is dangerous to think that only some methods require being synchronized. E.g. if a filter "contains" method does not alter the internal state then it could be unsynchronized. But if a concurrent thread changes the state then the result of the method call can change. Those filters with complex internal state that may reallocate memory will require synchronization around all methods, e.g. SparseBloomFilter. Without this there may be concurrent modification exceptions from internal data structures which are traversed but may be updated concurrently. For a SimpleBloomFilter there can be no structural modification after construction (the internal backing array is final). But concurrent writes may not be atomic and so data corruption can occur. For the case of reads (i.e. calls to contains), the result is probabilistic anyway so some error due to concurrent update while reading may be acceptable. What is your use case? It may be simpler to provide a static helper class to wrap access to the methods you are using that can change the filter (merge/cardinality). Your pool of threads can then call 'contains' without synchronization overhead and any threads that update the filter use the helper: // operations synchronized on the filter that is modified // applies only to filters with no object reallocation on modification static class FilterHelper { boolean merge(BloomFilter bf1, BloomFilter bf2) { synchronized (bf1) { return bf1.merge(bf2); } } } Thus you effectively synchronize writes but do not restrict reads. Some level of probabilistic error on read would have to be tolerated. Alex On Sun, 2 Jul 2023 at 07:40, Claude Warren <cla...@xenei.com> wrote: > > I have been thinking about what it would take to make SimpleBloomFilter > thread safe. I think that the answer is to use a copy on write strategy > and a lock within all the merge methods. > > However, this leaves the problem of the cardinality calculation. Currently > it is lazily performed and cached. I am thinking that there are 2 > solutions. > > 1. mark cardinality volatile and leave the calculation as it effectively > does a copy on write. > 2. update the cardinality inside the merge. This should be doable as we > can get the number of bits in each long after the merge and simply add them > up during processing. > > > Does anyone see a problem with this solution? > > Claude > -- > LinkedIn: http://www.linkedin.com/in/claudewarren --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org