Many Thanks Phil, for answering all my questions. On Tue, Oct 14, 2014 at 10:19 PM, Phil Steitz <phil.ste...@gmail.com> wrote:
> On 10/14/14 6:59 AM, venkatesha murthy wrote: > > ok. > > > > Wanted to understand advantage of having a container class for all > > storeless stats (just as DescriptiveStats is for Univariate). I could > open > > another email thread. > > SummaryStatistics is a container for storeless stats, > DescriptiveStatistics is for stats computed over a stored dataset, > possibly with a rolling window. The rationale here is that > SummaryStatistics aggregates StorelessUnivariateUnivariateStatistics > while DescriptiveStatistics aggregates statistics that implement > only UnivariateStatistic, which requires that the full set of data > be provided as an input array (so the aggregate has to maintain a > dataset in memory). The advantage of having a container for > storeless stats is that a stream of data can be fed into the > container's addValue method and the constituent stats will all get > updated with the values as they come in. > > > Also wanted to understand whats a abstract interface problem that you > were > > refering > > We moved to favoring abstract classes (where needed / useful) over > interfaces because it is easier to add to / modify abstract classes > than interfaces in a backward compatible way. > > > > thanks > > murthy > > > > On Tue, Oct 14, 2014 at 9:47 AM, Phil Steitz <phil.ste...@gmail.com> > wrote: > > > >> On 10/13/14 8:55 PM, venkatesha murthy wrote: > >>> On Tue, Oct 14, 2014 at 6:05 AM, Phil Steitz <phil.ste...@gmail.com> > >> wrote: > >>>> On 10/13/14 1:04 PM, venkatesha murthy wrote: > >>>>> Adding a bit more on this: > >>>>> a) The DescriptiveStatisticalSummary actually handles the rest of the > >>>>> functions such as addValue, getPercentile etc. > >>>>> b) I have added addValue() as it is important to see either storeless > >> or > >>>>> store variants as interfaces. > >>>>> c) A case in point being (for b); i was actually trying out a > lockfull > >>>> and > >>>>> a lockfree based variants for descriptive statistical summary and it > >> was > >>>>> very concise/consistent with an interface to use that has all common > >>>>> functions across all variants. > >>>>> d) well lock based or lock free variants are not a part of this patch > >> as > >>>>> iam still working through > >>>>> > >>>>> However i feel the getPercentile can definitely add value. Please let > >> me > >>>>> know if i could turn in all the relevant methods of > >>>>> DescriptiveStorelessStatistics into statistical summary (such as > >>>> kurtosis, > >>>>> skewness etc..) and then we could just use SummaryStatistics. > >>>> I am not sure I understand what you are proposing. Currently, we > >>>> have two statistical "aggregates" for descriptive univariate stats: > >>>> SummaryStatistics - aggregates "storeless" statistics over a stream > >>>> of data that is not stored in memory > >>>> DescriptiveStatistics - provides an extended set of statistics, some > >>>> of which require that the full set of data be stored in memory > >>>> > >>>> OK. I am sorry for the confusion here. I understand the intent now. > >>> However what i wanted to convey was all the statistics that > >>> is supported in current DescriptiveStatistics can be supported in > >> Storeless > >>> variant as well. (For eg: skewness, kurtosis, percentile) > >> No, for example exact percentiles, or even arbitrary percentiles > >> (without the quantile - e.g. quartile) specified in advance, can't > >> be computed without storing the data. Also, DescriptiveStatistics > >> supports a rolling window and stats it implements can make use of > >> multi-pass algorithms. > >> > >>> Therefore; what i was proposing is to have a common interface that can > >> have > >>> all these methods too. for eg: (we can change the name if it is needed) > >>> > >>> DescriptiveStatisticalSummary<S extends UnivariateStatistics> extends > >>> StatisticalSummary{ > >>> getKurtosis(); > >>> getPercentile(); > >>> getSkewness(); > >>> // Add Mutation methods as well > >>> addValue(double d); > >>> //Provide additional builder methods for injecting custom > >> percentile, > >>> kurtosis, skewness, variance etc. > >>> withPercentile(S Percentile); > >>> withKurtosis(S kurtosis); > >>> } > >> Per comments above, the contracts of these aggregates are > >> different. We have also moved away from defining abstract > >> interfaces as these end up creating problems when we want to add > >> things (as in the subject of this thread). > >> > >> Phil > >>>> The subject of this thread was a proposal to add quartiles to > >>>> SummaryStatistics, as the new(ish) PSquarePercentile allows those > >>>> statistics to be computed without storing the data. > >>>> > >>>> Agreed. I was just adding points on how we can bring both > >>> DescriptiveStatistics and SummaryStatistics under a common interface > for > >>> all the stats. > >>> > >>>> Phil > >>>>> On Tue, Oct 14, 2014 at 1:15 AM, venkatesha murthy < > >>>>> venkateshamurth...@gmail.com> wrote: > >>>>> > >>>>>> Hi Phil, > >>>>>> > >>>>>> Though i did not add to StatisticalSummary i was actually working > on a > >>>>>> DescriptiveStatisticalSummary for all the Storeless variants > inclusive > >>>> of > >>>>>> PSquarePercentile. Would it help if you can actually implement > >>>>>> SummaryStatisitcs with an extended interface such as > >>>>>> DescriptiveStatisticalSummary ? below. > >>>>>> > >>>>>> That said i actually wanted to discuss the new storelessvariant of > >>>>>> descriptive statistics. > >>>>>> a) DescriptiveStatisticalSummary - an extended interface for > >>>>>> StatisticalSummary (adds a Generic type that can cater for store > full > >>>> and > >>>>>> storeless) > >>>>>> b) DescriptiveStorelessStatistics - Storeless variant of > >>>>>> DescriptiveStatisitcs > >>>>>> c) SynchronizedDescriptiveStorelessStatistics - a synchronized > >> wrapper. > >>>>>> Test case classes added to the same. > >>>>>> > >>>>>> Please let me know on this i could also accomodate the changes to > >>>> summary > >>>>>> stats based on this change here. > >>>>>> Also please let me know if this could be raised as a jira ticket to > >>>> pursue. > >>>>>> Thanks > >>>>>> Murthy > >>>>>> > >>>>>> On Sat, Oct 11, 2014 at 1:10 AM, Phil Steitz <phil.ste...@gmail.com > > > >>>>>> wrote: > >>>>>> > >>>>>>> Now that we have a "storeless" percentile estimator, we can add > >>>>>>> quartile computation to SummaryStatistics. Any objections to my > >>>>>>> adding this? I could optionally add a boolean constructor argument > >>>>>>> to avoid the overhead of maintaining these stats. Or more > >>>>>>> generally, add a bitfield encoding the exact set of stats the user > >>>>>>> wants to maintain. If there are no objections to the addition, I > >>>>>>> will open a JIRA. > >>>>>>> > >>>>>>> Phil > >>>>>>> > >>>>>>> > >>>>>>> > --------------------------------------------------------------------- > >>>>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > >>>>>>> For additional commands, e-mail: dev-h...@commons.apache.org > >>>>>>> > >>>>>>> > >>>> > >>>> --------------------------------------------------------------------- > >>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > >>>> For additional commands, e-mail: dev-h...@commons.apache.org > >>>> > >>>> > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > >> For additional commands, e-mail: dev-h...@commons.apache.org > >> > >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > >