The link I thought I included did not carry over in the last post. The paper can be found at: https://webdocs.cs.ualberta.ca/~drafiei/papers/DupDet06Sigmod.pdf
On Thu, Jun 8, 2023 at 9:05 AM Claude Warren <cla...@apache.org> wrote: > > Have you considered using Stable Bloom Filters [1]. I think they do what > you want without a lot of the overhead you propose for your solution. In > addition, you may want to look at Commons-Collections v4.5 [2] (currently > snapshot) for efficient Bloom filter code. I have a Stable Bloom filter > implementation based on commons-collections somewhere. > > [1] Deng, Fan; Rafiei, Davood (2006), "Approximately Detecting Duplicates > for Streaming Data using Stable Bloom Filters", Proceedings of the ACM > SIGMOD Conference (PDF), pp. 25–36 > > [2] > https://github.com/apache/commons-collections/tree/master/src/main/java/org/apache/commons/collections4/bloomfilter >