Thanks! We're investigating. We'll let you know if we have further
questions.

  jon

On Thu, Aug 13, 2020, 11:40 PM Marko Mušnjak <marko.musn...@gmail.com>
wrote:

> Hi Jon,
> The first sketch is the one where I see the jump. The exact count without
> the first sketch is 24765.
>
> The result for lgK=12 without the first sketch is 11% off, lgK=5 is within
> 2%.
>
> Thanks,
> Marko
>
> On Fri, 14 Aug 2020 at 00:24, Jon Malkin <jon.mal...@gmail.com> wrote:
>
>> Hi Marko,
>>
>> Could you please let us know two more things:
>> 1) Which is the one particular sketch that causes the estimate to jump?
>> 2) What is the exact unique count of the others without that sketch?
>>
>> It sort of seems like the first sketch, but it's hard to know for sure
>> since we don't know the true leave-one-out exact counts.
>>
>> Thanks,
>>   jon
>>
>> On Thu, Aug 13, 2020 at 8:41 AM Marko Mušnjak <marko.musn...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> Could someone help me understand a behavior I see when trying to union
>>> some HLL sketches?
>>>
>>> I have 14 HLL sketches, and I know the exact unique counts for each of
>>> them. All the individual sketches give estimates within 2% of the exact
>>> counts.
>>>
>>> When I try to create a union, using the default lgMaxK parameter results
>>> in total estimate that is way off (25% larger then exact count).
>>>
>>> However, reducing the lgMaxK parameter in the union to 4 or 5 gives
>>> results that are within 2.5% of the exact counts.
>>>
>>> Also, one particular sketch seems to cause the final estimate to jump -
>>> not adding that sketch to the union keeps the result close to the exact
>>> count.
>>>
>>> Am I just seeing a very bad random error, or is there anything I'm doing
>>> wrong with the unions?
>>>
>>> Running on Java, using version 1.3.0. Just in case, the sketches are in
>>> the linked gist (hex encoded, one per line):
>>> https://gist.github.com/mmusnjak/c00a72b3dfbc52e780c2980acfd98351
>>> and the exact counts:
>>> https://gist.github.com/mmusnjak/dcbff67101be6cfc28ba01e63e41f73c
>>>
>>> Thank you!
>>> Marko Musnjak
>>>
>>>

Reply via email to