Hi all,
I'm working on speeding up distinct count calculations, and it looks like
roaring bitmaps (RB) is the newest and meanest way for set operations. Anyone
here have experience with them? How was the performance compared to hyperloglog
and EWAH? A quick google search showed me that it's easier to find UDF
implementations of hyperloglog in presto and hive, but if the hype is real, it
might be worth spending the time to incorporate RB. Also, if anyone can point me
to reliable implementations of UDFs using RB, I would love to check it out and
test it myself =)
Happy Holidays!
Nitin