Think bloom filter that's more dynamic.  It works well when cardinality is
low, but grows quickly to out cost bloom filter as cardinality grows.

This data structure supports existence queries, but your email sounds like
you want count.  If so not really the best fit.

On Dec 8, 2017 5:00 PM, "Nitin Vijayvargiya" <nitinvija...@gmail.com> wrote:

Hi all,

I'm working on speeding up distinct count calculations, and it looks like
roaring bitmaps (RB) is the newest and meanest way for set operations.
Anyone here have experience with them? How was the performance compared to
hyperloglog and EWAH? A quick google search showed me that it's easier to
find UDF implementations of hyperloglog in presto and hive, but if the hype
is real, it might be worth spending the time to incorporate RB. Also, if
anyone can point me to reliable implementations of UDFs using RB, I would
love to check it out and test it myself =)

Happy Holidays!

Nitin

Reply via email to