Hi Ted, Yes, that's what we did recently, too: https://github.com/clearspring/stream-lib/pull/47
... but it's still a little too phat...which is what made me think of your OnlineSummarizer as a possible, slimmer alternative. Otis ---- Performance Monitoring for Solr / ElasticSearch / Hadoop / HBase - http://sematext.com/spm >________________________________ > From: Ted Dunning <[email protected]> >To: "[email protected]" <[email protected]>; Otis Gospodnetic ><[email protected]> >Sent: Thursday, August 8, 2013 8:27 AM >Subject: Re: Is OnlineSummarizer mergeable? > > > >I just looked at the source for QDigest from streamlib. > > >I think that the memory usage could be trimmed substantially, possibly by as >much as 5:1 by using more primitive friendly structures. > > > > > >On Wed, Aug 7, 2013 at 3:04 PM, Otis Gospodnetic <[email protected]> >wrote: > >Hi Ted, >> >>I need percentiles. Ideally not pre-defined ones, because one person may >>want e.g. 70th pctile, while somebody else might want 75th pctile for the >>same metric. >> >>Deal breakers: >>High memory footprint. ("high" means "higher than QDigest from stream-lib" >>for us.... and we could test and compare with QDigest relatively easily with >>live data) >>Algos that create data structures that cannot be merged >>Loss of accuracy that is not predictably small or configurable >> >>Thank you, >>Otis >>---- >> >>Performance Monitoring for Solr / ElasticSearch / Hadoop / HBase - >>http://sematext.com/spm >> >> >> >> >>>________________________________ >>> From: Ted Dunning <[email protected]> >>>To: "[email protected]" <[email protected]>; Otis Gospodnetic >>><[email protected]> >>>Sent: Wednesday, August 7, 2013 11:48 PM >>>Subject: Re: Is OnlineSummarizer mergeable? >>> >>> >>> >>>Otis, >>> >>> >>>What statistics do you need? >>> >>> >>>What guarantees? >>> >>> >>> >>> >>> >>>On Wed, Aug 7, 2013 at 1:26 PM, Otis Gospodnetic >>><[email protected]> wrote: >>> >>>Hi Ted, >>>> >>>>I'm actually trying to find an alternative to QDigest (the stream-lib impl >>>>specifically) because even though it seems good, we have to deal with crazy >>>>volumes of data in SPM (performance monitoring service, see signature)... >>>>I'm hoping we can find something that has both a lower memory footprint >>>>than QDigest AND that is mergeable a la QDigest. Utopia? >>>> >>>>Thanks, >>>>Otis >>>>---- >>>>Performance Monitoring for Solr / ElasticSearch / Hadoop / HBase - >>>>http://sematext.com/spm >>>> >>>> >>>> >>>> >>>>>________________________________ >>>>> From: Ted Dunning <[email protected]> >>>>>To: "[email protected]" <[email protected]> >>>>>Sent: Wednesday, August 7, 2013 4:51 PM >>>>>Subject: Re: Is OnlineSummarizer mergeable? >>>>> >>>>> >>>>>It isn't as mergeable as I would like. If you have randomized record >>>>>selection, it should be possible, but perverse ordering can cause serious >>>>>errors. >>>>> >>>>>It would be better to use something like a Q-digest. >>>>> >>>>>http://www.cs.virginia.edu/~son/cs851/papers/ucsb.sensys04.pdf >>>>> >>>>> >>>>> >>>>> >>>>>On Wed, Aug 7, 2013 at 4:21 AM, Otis Gospodnetic >>>>><[email protected] >>>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> Is OnlineSummarizer algo "mergeable"? >>>>>> >>>>>> Say that we compute a percentile for some metric for time 12:00-12:01 >>>>>> and store that somewhere, then we compute it for 1201-12:02 and store >>>>>> that separately, and so on. >>>>>> >>>>>> Can we then later merge these computed and previously stored >>>>>> percentile "instances" and get an accurate value? >>>>>> >>>>>> Thanks, >>>>>> Otis >>>>>> -- >>>>>> Performance Monitoring -- http://sematext.com/spm >>>>>> Solr & ElasticSearch Support -- http://sematext.com/ >>>>>> >>>>> >>>>> >>>>> >>> >>> >>> > > >
