Re: Ad impression counting and unique users counting using data sketches

2021-09-16 Thread Kartik Mahajan
Thanks for your inputs, Karl and Lee. Regards kartik On Fri, Sep 17, 2021 at 6:15 AM leerho wrote: > Kartik, >> >> *Do you think this is a good model to solve Q2?* > > Your Q2 is in the domain of unique users. So, Yes. And, if you are using > Druid to do effectively a "select and group-by" of

Re: Ad impression counting and unique users counting using data sketches

2021-09-16 Thread leerho
Kartik, > > *Do you think this is a good model to solve Q2?* Your Q2 is in the domain of unique users. So, Yes. And, if you are using Druid to do effectively a "select and group-by" of the raw data used to feed the two sketches, then just using Theta Sketches is sufficient. The Tuple Sketches a

Re: Ad impression counting and unique users counting using data sketches

2021-09-16 Thread Karl Matthias
Hi Kartik, I certainly don't have the expertise with this that Lee does, but stepping back from your specific examples, to use a Theta sketch: 1. All of the sets/sketches you want to have interact together must contain the same domain values, be that User-ID or Impression-ID or something

Re: Ad impression counting and unique users counting using data sketches

2021-09-15 Thread Kartik Mahajan
Hi Lee, I am grateful to you for your inputs. Thank you so much. Let's focus on Q2 and let me explain by what I meant by "So after roll-up, I would end up with 1 theta-sketch per dimension value per day(assuming day level granularity) and I can do unions and intersections on these sketches to answ

Re: Ad impression counting and unique users counting using data sketches

2021-09-15 Thread leerho
Hi Karik, The problem you describe is typical for on-line advertising and similar to ones we have worked on before. Solving this problem with sketches will provide approximate results in near-real time. However, doing so even with sketches may require considerably more resources than you may be p