Hi Marko, Could you please let us know two more things: 1) Which is the one particular sketch that causes the estimate to jump? 2) What is the exact unique count of the others without that sketch?
It sort of seems like the first sketch, but it's hard to know for sure since we don't know the true leave-one-out exact counts. Thanks, jon On Thu, Aug 13, 2020 at 8:41 AM Marko Mušnjak <marko.musn...@gmail.com> wrote: > Hi, > > Could someone help me understand a behavior I see when trying to union > some HLL sketches? > > I have 14 HLL sketches, and I know the exact unique counts for each of > them. All the individual sketches give estimates within 2% of the exact > counts. > > When I try to create a union, using the default lgMaxK parameter results > in total estimate that is way off (25% larger then exact count). > > However, reducing the lgMaxK parameter in the union to 4 or 5 gives > results that are within 2.5% of the exact counts. > > Also, one particular sketch seems to cause the final estimate to jump - > not adding that sketch to the union keeps the result close to the exact > count. > > Am I just seeing a very bad random error, or is there anything I'm doing > wrong with the unions? > > Running on Java, using version 1.3.0. Just in case, the sketches are in > the linked gist (hex encoded, one per line): > https://gist.github.com/mmusnjak/c00a72b3dfbc52e780c2980acfd98351 > and the exact counts: > https://gist.github.com/mmusnjak/dcbff67101be6cfc28ba01e63e41f73c > > Thank you! > Marko Musnjak > >