I would suggest you consider an alternative data structures: a Cuckoo Filter or a Golumb Compressed Sequence.
The GCS data structure was introduced in Cache-, Hash- and Space-Efficient Bloom Filters <http://algo2.iti.kit.edu/documents/cacheefficientbloomfilters-jea.pdf> by F. Putze, P. Sanders, and J. Singler. See section 4. > We should discuss which exact implementation of bloom filters are the best > fit. > @Fabian: There are also implementations of bloom filters that use counting > and therefore support > deletes, but obviously this comes at the cost of a potentially higher > space consumption. > > Am 23.05.2018 um 11:29 schrieb Fabian Hueske <fhue...@gmail.com>: >> IMO, such a feature would be very interesting. However, my concerns with >> Bloom Filter >> is that they are insert-only data structures, i.e., it is not possible to >> remove keys once >> they were added. This might render the filter useless over time. >> In a different thread (see discussion in FLINK-8918 [1]), you mentioned >> that the Bloom >> Filters would be growing. >> If we keep them in memory, how can we prevent them from exceeding memory >> boundaries over >> time? > >