The link I thought I included did not carry over in the last post.  The
paper can be found at:
https://webdocs.cs.ualberta.ca/~drafiei/papers/DupDet06Sigmod.pdf

On Thu, Jun 8, 2023 at 9:05 AM Claude Warren <cla...@apache.org> wrote:

>
> Have you considered using Stable Bloom Filters [1].   I think they do what
> you want without a lot of the overhead you propose for your solution.  In
> addition, you may want to look at Commons-Collections v4.5 [2] (currently
> snapshot) for efficient Bloom filter code.  I have a Stable Bloom filter
> implementation based on commons-collections somewhere.
>
> [1] Deng, Fan; Rafiei, Davood (2006), "Approximately Detecting Duplicates
> for Streaming Data using Stable Bloom Filters", Proceedings of the ACM
> SIGMOD Conference (PDF), pp. 25–36
>
> [2]
> https://github.com/apache/commons-collections/tree/master/src/main/java/org/apache/commons/collections4/bloomfilter
>

Reply via email to