Hi,

Could someone help me understand a behavior I see when trying to union some
HLL sketches?

I have 14 HLL sketches, and I know the exact unique counts for each of
them. All the individual sketches give estimates within 2% of the exact
counts.

When I try to create a union, using the default lgMaxK parameter results in
total estimate that is way off (25% larger then exact count).

However, reducing the lgMaxK parameter in the union to 4 or 5 gives results
that are within 2.5% of the exact counts.

Also, one particular sketch seems to cause the final estimate to jump - not
adding that sketch to the union keeps the result close to the exact count.

Am I just seeing a very bad random error, or is there anything I'm doing
wrong with the unions?

Running on Java, using version 1.3.0. Just in case, the sketches are in the
linked gist (hex encoded, one per line):
https://gist.github.com/mmusnjak/c00a72b3dfbc52e780c2980acfd98351
and the exact counts:
https://gist.github.com/mmusnjak/dcbff67101be6cfc28ba01e63e41f73c

Thank you!
Marko Musnjak

Reply via email to