On Tue, Mar 13, 2018 at 12:47 AM, Gilles <gil...@harfang.homelinux.org>
wrote:

>
>>
>> Where can we find the old code before port into new Commons components?
>>
>
> The code bases are managed by the "git" software; the whole history is
> available:
>   https://git1-us-west.apache.org/repos/asf?p=commons-math.git;a=log
>
> [I'd advise to "clone" the repositories on your local computer, and
> use the command line tools.]


I believe you will want to clone the commons-math repositories, but then
develop your own "fork" of the commons-statistics repository. Gilles can
correct me if that is wrong.


>
>
> As
>> you mentioned it will be a good approach to redesign process.
>>
>
> You don't necessarily need to analyze how the code was before
> the port/refactoring; looking at how it is now is sufficient,
> unless you suspect that something is wrong now and might have
> been better before. ;-)
>

In particular, the statistics library was designed before Java 8. Java 8
however has provided both efficient programming strategies for these
statistical methods (in the form of lambdas and streams) as well as some
built-in methods providing summary statistics functions (see discussion at
http://markmail.org/message/7t2mjaprsuvb3waj).

It probably makes sense, as a design strategy, to separate the function
implementation from the streaming implementation. For example, a 2D integer
array will probably require a different streaming implementation than a 1D
double array, but they can  probably both be passed the same function
handle to collect, say, the mean or max value.

The role of commons might then be to provide a convenient interface, so
that the user can simply call a static method like SummaryStats.mean() and
not have to worry about the implementation.

The other difficulty I see, is that quantile and median statistics will not
be as easy to stream as statistics with a closed-form solution like mean or
variance. There may however be great algorithms out there for pulling the
median or the 95% quantile out of a stream -- if so they should be used.

Eric

Reply via email to