Yes, Druid does this on top of the specialized Tuple sketch called
ArrayOfDoublesSketch (in Java).
Each key in the sketch has an array of floating-point values associated
with it.
PostAggregator functions can convert these columns into means and
variances using org.apache.commons.math3.stat.descriptive.SummaryStatistics.
Here is the code for variance:
https://github.com/apache/druid/blob/master/extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/tuple/ArrayOfDoublesSketchToVariancesPostAggregator.java




On Mon, Jan 16, 2023 at 5:54 AM Tomer B <tomer...@gmail.com> wrote:

> Thanks yeah ! (tuple sketch and not theta as you said!).
> I have another question please I looked at the tuple sketch I looked at:
> https://datasketches.apache.org/api/java/snapshot/apidocs/org/apache/datasketches/tuple/aninteger/IntegerSummary.Mode.html
> <https://urldefense.com/v3/__https://datasketches.apache.org/api/java/snapshot/apidocs/org/apache/datasketches/tuple/aninteger/IntegerSummary.Mode.html__;!!Op6eflyXZCqGR5I!H89uu0Se4Jc-CW8BoOGfWwb86tOutxtY99QICcTS6w2ouS48kYdzn0NQTlxcJzRwTOAsQ9vUgGooQ1kunA$>
> and I see possible values of mode are: Sum, Min, Max, AlwaysOne so I don't
> see there is 'Variance'.  So is tuple sketch not supporting variance out of
> the box?  I looked at druid and I see it does support variance sketch
> https://druid.apache.org/docs/latest/development/extensions-core/datasketches-tuple.html#variance-values-for-each-column
> <https://urldefense.com/v3/__https://druid.apache.org/docs/latest/development/extensions-core/datasketches-tuple.html*variance-values-for-each-column__;Iw!!Op6eflyXZCqGR5I!H89uu0Se4Jc-CW8BoOGfWwb86tOutxtY99QICcTS6w2ouS48kYdzn0NQTlxcJzRwTOAsQ9vUgGqD0Jet1g$>
> does this means the following: Tuple sketches do not support variance out
> of the box, but as druid supports it on top of the tuple sketches it's
> probably going to be possible for me to add similar implementation on top
> of DataSketches TupleSketches ?
>
> Thanks!
>
>
> On Sun, Jan 1, 2023 at 2:03 AM Jon Malkin <jon.mal...@gmail.com> wrote:
>
>> I believe you're looking at the tuple sketch code in java, not theta
>> sketch. We don't yet have tuple support in C++ (on which python is based).
>> It's planned, but I haven't yet had time to sit down and figure out how to
>> do it -- and specifically how to do so with a reasonable API.
>>
>>   jon
>>
>
>
> --
>
>

Reply via email to