Are you trying to add HLL UDAF for hive? If so recent versions of Hive already
has an implementation of HLL++ which does not need bitset.
https://github.com/apache/hive/tree/master/standalone-metastore/src/main/java/org/apache/hadoop/hive/common/ndv/hll
Also the bloom filter implementation in hiv
Hi Prasanth,
Thanks, that was exactly what I was looking for. My main concern is speed, so I
tried going with the brickhouse implementation of HLL+, and ended up having to
make minor modifications to the code in order to have it run. My only concern is
that the precision check tests don't always pa
I did performance benchmark for roaring bitmaps when I added bloomfilters
(hyperloglog also shares the same bitset impl) to Orc and Hive.
I found that roaring bitmap is good at compression at the cost of speed. In a
JMH benchmark, observed around ~10x slowdown during insert and probe when using
Hi David,
Thanks for the response. Yea, bloom filters are mostly for existential checks.
I'm looking for a way to preprocess data, and then perform operations like
union/intersection between them to find counts. Example: Number of distinct
users visiting website A over the last 5 days (union), inte