Hello,

This info about q0 and q1 is good to know, I will use it, thank you !

As a user in order to plot that I would be glad to get the split points 
alongside the histogram values.
It would be as useful to retrieve them from the `numBins` or `splitPoints` for 
consistency indeed: when I need to display histogram I don’t want to use two 
different code path to handle the request result.

I can imagine different formats I could use with each pro and cons :
- list of tuple: `[ [ <bin value>, <bin count> ], ... ]`
        pro: simple
        con: the format may be confusing without docs, breaks the current 
output format (can be solved by adding a flag controlling output)
- list of objects: `[ { "value": <value>, "count": <count> }, ...]`
        pro: simple, timeseries-like, probably the most easy to display
        con: breaks the current output format (can be solved by adding a flag 
controlling output)
- bins postAggregator + histogram values postAggregator : `{ bins: [ ... ], 
values: [ ... ] }`
        pro: compatible with current format, feature is available on-demand
        con: must zip arrays on client side

What do you think ?

--

Jérémie Girault
Le 6 nov. 2020 à 19:19 +0100, Alexander Saydakov 
<sayda...@verizonmedia.com.invalid>, a écrit :
> quantile(0) = min value
> quantile(1) = max value
> you can use sketch-to-quantiles post agg to get min, max or any number of
> other quantiles
>
> Regarding your observation that sketch-to-histogram(num bins) does not give
> information about the computed split points. That is valuable feedback.
> Perhaps, we could consider returning the split points somehow, but I am not
> quite sure what the return type should be. We need to return two arrays:
> probability mass in each bin as we do currently - that is one array of
> doubles, and split points computed from min, max, and given number of bins.
> And this post agg can accept split points - should we return them in that
> case as well for consistency?
>
>
> On Fri, Nov 6, 2020 at 3:30 AM Jérémie Girault <jere...@hubvisor.io> wrote:
>
> > Hello everyone,
> >
> > I previously asked a question on the ASF slack and someone replied to me
> > by asking me to send the question on the dev list. I just subscribed to the
> > list to forward the message I sent :
> >
> > I was playing with the DataSketches Quantiles Sketch module in druid
> > trying to retrieve some histograms using quantilesDoublesSketchToHistogram.
> > However I couldn't label the values I retrieved for each bin when using
> > numBins when trying to plot them.
> > I can’t seem to find any postAggregator that allows me to get min/max
> > values in order to recompute bins on the client side.
> > Should I use min/max aggregators when ingesting, and query them alongside
> > my histogram as a workaround ? It seem a lot of space/time that would seem
> > to be « free » to retrieve using Quantile Sketches.
> > Wouldn’t it be useful to have min/max postAggregators for
> > quantilesDoubleSketches aggregator and/or histogram bins labels ?
> > I located this chunk of code:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_druid_blob_master_extensions-2Dcore_datasketches_src_main_java_org_apache_druid_query_aggregation_datasketches_quantiles_DoublesSketchToHistogramPostAggregator.java&d=DwIFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=PUk1rdn3YFgKzf5pRy7hKdCZt_J-_DZgbh_wjexBneI&s=fb3Uh150BuY9jtM8DqGofrqtwQrM9jDfupPq6MwF5hk&e=
> > That does not seem overly complicated in a way I could not contribute, but
> > I’m not used to java dev these days and it would take me a while to get it
> > right.
> > Would such features be considered if requested/submitted ?
> >
> > Thank you,
> >
> > --
> >
> > Jérémie Girault
> >

Reply via email to