Hi David,
Thanks for the response. Yea, bloom filters are mostly for existential checks.
I'm looking for a way to preprocess data, and then perform operations like
union/intersection between them to find counts. Example: Number of distinct
users visiting website A over the last 5 days (union), intersected with the
number of distinct visitors visiting website B over the last 10 days (union).

Hyperloglog is the right tool for this, but if someone has done performance
benchmarking between HLL and Roaring BitMap, it would save me a lot of time.
Thanks,Nitin  





On Fri, Dec 8, 2017 7:08 PM, David Capwell dcapw...@gmail.com  wrote:
Think bloom filter that's more dynamic.  It works well when cardinality is low,
but grows quickly to out cost bloom filter as cardinality grows.
This data structure supports existence queries, but your email sounds like you
want count.  If so not really the best fit.

On Dec 8, 2017 5:00 PM, "Nitin Vijayvargiya" <nitinvija...@gmail.com> wrote:
Hi all,
I'm working on speeding up distinct count calculations, and it looks like
roaring bitmaps (RB) is the newest and meanest way for set operations. Anyone
here have experience with them? How was the performance compared to hyperloglog
and EWAH? A quick google search showed me that it's easier to find UDF
implementations of hyperloglog in presto and hive, but if the hype is real, it
might be worth spending the time to incorporate RB. Also, if anyone can point me
to reliable implementations of UDFs using RB, I would love to check it out and
test it myself =)
Happy Holidays!
Nitin

Reply via email to