Phil Steitz a écrit :
> We should be able to find a clean way to do what this enhancement
> request is asking for.  I am feeling stupid because even when I consider
> breaking compatibility / refactoring to use generics, I can't find a
> simple way to do it.  Here is a description of the current API and some
> failed ideas that I have considered so far.   As usual, I would like to
> minimize pain for current users in addressing this, but at this point I
> am starting to think that wholesale refactoring is necessary and I would
> appreciate ideas on the best way to do this.
> 
> SummaryStatistics provides "storeless" computation of summary statistics
> - min, max, mean, variance, etc.  Here "storeless" means that the class
> does not hold the stream of data in memory.  It was designed to support
> pluggable implementations of the statistics that it computes.  It does
> this in a way that looks smelly in the new world of type-safe Java
> (well, maybe it always smelled ;)  The injectable implementation classes
> in SummaryStatistics are typed as "StorelessUnivariateStatistic" which
> is an interface that includes things like getResult() and
> increment(double).  There is nothing preventing, for example, a variance
> implementation from being "plugged in" to implement the mean.
> 
> The request in MATH-224 is to support aggregation in the following
> sense:  SummaryStatistics instance 1 gets a stream of values and
> instance 2 gets another stream of values and we want to create a new
> instance or replace instance 1 with an instance that behaves as though
> it got all the data from both streams.  The simplest way to do this
> would be to add an "aggregate" method to the
> StorelessUnivariateStatistic interface and then just implement
> aggregation in SummaryStatistics by delegation to the implementation
> instances.  This is essentially what the patch attached to MATH-224
> does.  The problem with this approach is that supporting aggregation is
> a fairly strong requirement in general, stronger than just requiring
> that the statistic be computable without storing the data.  Stronger
> still is the requirement that an implementation of a statistic be
> "aggregatable" with a possibly different implementation (since then it
> would have access only to the value of the other statistic).
> 
> So the challenge is can we find a clean way to achieve the four objectives:
> 
> 0) Maintain pluggability of statistics implementations
> 1) Support aggregation
> 2) Improve type safety
> 3) Minimize trauma for current users
> 
> Dropping 0) makes things much simpler, but I would like to avoid that
> unless there is really no way to accomplish 1) and 2) without taking
> that step.  Strictly speaking, 1) may be impossible as I know of no way
> to support this for the higher moments.  I would be OK with aggregation
> forcing these to NaN (documented, of course).

I think 1) has a high priority. It is the whole subject of the issue. I
see several use cases for it, including for example parallel computation
 and later merge. I also think providing it for higher moments requires
not only the final result but also intermediate values (typically
sum(x^0), sum(x^1), sum(x^2) ...). So this implies these value are
available in addition to final results.

Perhaps one way to allow it would be to have different interfaces for
the various statistics. Mean would provide sum(x^0) and sum(x^1) for
example in addition to the final result which is the ratio. The cost for
this is a more complex API, but I think it is worth trying.

Luc

> 
> My first thought was to define a parameterized Aggregatable interface
> that requires the same types.  Then two SummaryStatistics instances are
> aggregatable iff their implementation statistics match types.  I am OK
> with these restrictions, but am having trouble actually making it work.
> 
> Suggestions / patches welcome!
> 
> Phil
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to