We should be able to find a clean way to do what this enhancement
request is asking for. I am feeling stupid because even when I consider
breaking compatibility / refactoring to use generics, I can't find a
simple way to do it. Here is a description of the current API and some
failed ideas that I have considered so far. As usual, I would like to
minimize pain for current users in addressing this, but at this point I
am starting to think that wholesale refactoring is necessary and I would
appreciate ideas on the best way to do this.
SummaryStatistics provides "storeless" computation of summary statistics
- min, max, mean, variance, etc. Here "storeless" means that the class
does not hold the stream of data in memory. It was designed to support
pluggable implementations of the statistics that it computes. It does
this in a way that looks smelly in the new world of type-safe Java
(well, maybe it always smelled ;) The injectable implementation classes
in SummaryStatistics are typed as "StorelessUnivariateStatistic" which
is an interface that includes things like getResult() and
increment(double). There is nothing preventing, for example, a variance
implementation from being "plugged in" to implement the mean.
The request in MATH-224 is to support aggregation in the following
sense: SummaryStatistics instance 1 gets a stream of values and
instance 2 gets another stream of values and we want to create a new
instance or replace instance 1 with an instance that behaves as though
it got all the data from both streams. The simplest way to do this
would be to add an "aggregate" method to the
StorelessUnivariateStatistic interface and then just implement
aggregation in SummaryStatistics by delegation to the implementation
instances. This is essentially what the patch attached to MATH-224
does. The problem with this approach is that supporting aggregation is
a fairly strong requirement in general, stronger than just requiring
that the statistic be computable without storing the data. Stronger
still is the requirement that an implementation of a statistic be
"aggregatable" with a possibly different implementation (since then it
would have access only to the value of the other statistic).
So the challenge is can we find a clean way to achieve the four objectives:
0) Maintain pluggability of statistics implementations
1) Support aggregation
2) Improve type safety
3) Minimize trauma for current users
Dropping 0) makes things much simpler, but I would like to avoid that
unless there is really no way to accomplish 1) and 2) without taking
that step. Strictly speaking, 1) may be impossible as I know of no way
to support this for the higher moments. I would be OK with aggregation
forcing these to NaN (documented, of course).
My first thought was to define a parameterized Aggregatable interface
that requires the same types. Then two SummaryStatistics instances are
aggregatable iff their implementation statistics match types. I am OK
with these restrictions, but am having trouble actually making it work.
Suggestions / patches welcome!
Phil
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org