I just looked at the source for QDigest from streamlib.

I think that the memory usage could be trimmed substantially, possibly by
as much as 5:1 by using more primitive friendly structures.



On Wed, Aug 7, 2013 at 3:04 PM, Otis Gospodnetic <[email protected]
> wrote:

> Hi Ted,
>
> I need percentiles.  Ideally not pre-defined ones, because one person may
> want e.g. 70th pctile, while somebody else might want 75th pctile for the
> same metric.
>
> Deal breakers:
> High memory footprint. ("high" means "higher than QDigest from stream-lib"
> for us.... and we could test and compare with QDigest relatively easily
> with live data)
> Algos that create data structures that cannot be merged
> Loss of accuracy that is not predictably small or configurable
>
> Thank you,
> Otis
> ----
>
> Performance Monitoring for Solr / ElasticSearch / Hadoop / HBase -
> http://sematext.com/spm
>
>
>
>
> >________________________________
> > From: Ted Dunning <[email protected]>
> >To: "[email protected]" <[email protected]>; Otis Gospodnetic <
> [email protected]>
> >Sent: Wednesday, August 7, 2013 11:48 PM
> >Subject: Re: Is OnlineSummarizer mergeable?
> >
> >
> >
> >Otis,
> >
> >
> >What statistics do you need?
> >
> >
> >What guarantees?
> >
> >
> >
> >
> >
> >On Wed, Aug 7, 2013 at 1:26 PM, Otis Gospodnetic <
> [email protected]> wrote:
> >
> >Hi Ted,
> >>
> >>I'm actually trying to find an alternative to QDigest (the stream-lib
> impl specifically) because even though it seems good, we have to deal with
> crazy volumes of data in SPM (performance monitoring service, see
> signature)... I'm hoping we can find something that has both a lower memory
> footprint than QDigest AND that is mergeable a la QDigest.  Utopia?
> >>
> >>Thanks,
> >>Otis
> >>----
> >>Performance Monitoring for Solr / ElasticSearch / Hadoop / HBase -
> http://sematext.com/spm
> >>
> >>
> >>
> >>
> >>>________________________________
> >>> From: Ted Dunning <[email protected]>
> >>>To: "[email protected]" <[email protected]>
> >>>Sent: Wednesday, August 7, 2013 4:51 PM
> >>>Subject: Re: Is OnlineSummarizer mergeable?
> >>>
> >>>
> >>>It isn't as mergeable as I would like.  If you have randomized record
> >>>selection, it should be possible, but perverse ordering can cause
> serious
> >>>errors.
> >>>
> >>>It would be better to use something like a Q-digest.
> >>>
> >>>http://www.cs.virginia.edu/~son/cs851/papers/ucsb.sensys04.pdf
> >>>
> >>>
> >>>
> >>>
> >>>On Wed, Aug 7, 2013 at 4:21 AM, Otis Gospodnetic <
> [email protected]
> >>>> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> Is OnlineSummarizer algo "mergeable"?
> >>>>
> >>>> Say that we compute a percentile for some metric for time 12:00-12:01
> >>>> and store that somewhere, then we compute it for 1201-12:02 and store
> >>>> that separately, and so on.
> >>>>
> >>>> Can we then later merge these computed and previously stored
> >>>> percentile "instances" and get an accurate value?
> >>>>
> >>>> Thanks,
> >>>> Otis
> >>>> --
> >>>> Performance Monitoring -- http://sematext.com/spm
> >>>> Solr & ElasticSearch Support -- http://sematext.com/
> >>>>
> >>>
> >>>
> >>>
> >
> >
> >

Reply via email to