Yes, Druid does this on top of the specialized Tuple sketch called ArrayOfDoublesSketch (in Java). Each key in the sketch has an array of floating-point values associated with it. PostAggregator functions can convert these columns into means and variances using org.apache.commons.math3.stat.descriptive.SummaryStatistics. Here is the code for variance: https://github.com/apache/druid/blob/master/extensions-core/datasketches/src/main/java/org/apache/druid/query/aggregation/datasketches/tuple/ArrayOfDoublesSketchToVariancesPostAggregator.java
On Mon, Jan 16, 2023 at 5:54 AM Tomer B <tomer...@gmail.com> wrote: > Thanks yeah ! (tuple sketch and not theta as you said!). > I have another question please I looked at the tuple sketch I looked at: > https://datasketches.apache.org/api/java/snapshot/apidocs/org/apache/datasketches/tuple/aninteger/IntegerSummary.Mode.html > <https://urldefense.com/v3/__https://datasketches.apache.org/api/java/snapshot/apidocs/org/apache/datasketches/tuple/aninteger/IntegerSummary.Mode.html__;!!Op6eflyXZCqGR5I!H89uu0Se4Jc-CW8BoOGfWwb86tOutxtY99QICcTS6w2ouS48kYdzn0NQTlxcJzRwTOAsQ9vUgGooQ1kunA$> > and I see possible values of mode are: Sum, Min, Max, AlwaysOne so I don't > see there is 'Variance'. So is tuple sketch not supporting variance out of > the box? I looked at druid and I see it does support variance sketch > https://druid.apache.org/docs/latest/development/extensions-core/datasketches-tuple.html#variance-values-for-each-column > <https://urldefense.com/v3/__https://druid.apache.org/docs/latest/development/extensions-core/datasketches-tuple.html*variance-values-for-each-column__;Iw!!Op6eflyXZCqGR5I!H89uu0Se4Jc-CW8BoOGfWwb86tOutxtY99QICcTS6w2ouS48kYdzn0NQTlxcJzRwTOAsQ9vUgGqD0Jet1g$> > does this means the following: Tuple sketches do not support variance out > of the box, but as druid supports it on top of the tuple sketches it's > probably going to be possible for me to add similar implementation on top > of DataSketches TupleSketches ? > > Thanks! > > > On Sun, Jan 1, 2023 at 2:03 AM Jon Malkin <jon.mal...@gmail.com> wrote: > >> I believe you're looking at the tuple sketch code in java, not theta >> sketch. We don't yet have tuple support in C++ (on which python is based). >> It's planned, but I haven't yet had time to sit down and figure out how to >> do it -- and specifically how to do so with a reasonable API. >> >> jon >> > > > -- > >