Hello.
Le mar. 28 mai 2019 à 20:36, Alex Herbert <[email protected]> a écrit :
>
>
>
> > On 28 May 2019, at 18:09, Eric Barnhill <[email protected]> wrote:
> >
> > The previous commons-math interface for descriptive statistics used a
> > paradigm of constructing classes for various statistical functions and
> > calling evaluate(). Example
> >
> > Mean mean = new Mean();
> > double mn = mean.evaluate(double[])
> >
> > I wrote this type of code all through grad school and always found it
> > unnecessarily bulky. To me these summary statistics are classic use cases
> > for static methods:
> >
> > double mean .= Mean.evaluate(double[])
> >
> > I don't have any particular problem with the evaluate() syntax.
> >
> > I looked over the old Math 4 API to see if there were any benefits to the
> > previous class-oriented approach that we might not want to lose. But I
> > don't think there were, the functionality outside of evaluate() is minimal.
>
> A quick check shows that evaluate comes from UnivariateStatistic. This has
> some more methods that add little to an instance view of the computation:
>
> double evaluate(double[] values) throws MathIllegalArgumentException;
> double evaluate(double[] values, int begin, int length) throws
> MathIllegalArgumentException;
> UnivariateStatistic copy();
>
> However it is extended by StorelessUnivariateStatistic which adds methods to
> update the statistic:
>
> void increment(double d);
> void incrementAll(double[] values) throws MathIllegalArgumentException;
> void incrementAll(double[] values, int start, int length) throws
> MathIllegalArgumentException;
> double getResult();
> long getN();
> void clear();
> StorelessUnivariateStatistic copy();
>
> This type of functionality would be lost by static methods.
>
> If you are moving to a functional interface type pattern for each statistic
> then you will lose the other functionality possible with an instance state,
> namely updating with more values or combining instances.
>
> So this is a question of whether updating a statistic is required after the
> first computation.
>
> Will there be an alternative in the library for a map-reduce type operation
> using instances that can be combined using Stream.collect:
>
> <R> R collect(Supplier<R> supplier,
> ObjDoubleConsumer<R> accumulator,
> BiConsumer<R, R> combiner);
>
> Here <R> would be Mean:
>
> double mean = Arrays.stream(new double[1000]).collect(Mean::new, Mean::add,
> Mean::add).getMean() with:
>
> void add(double);
> void add(Mean);
> double getMean();
>
> (Untested code)
>
> >
> > Finally we should consider whether we really need a separate class for each
> > statistic at all. Do we want to call:
> >
> > Mean.evaluate()
> >
> > or
> >
> > SummaryStats.mean()
> >
> > or maybe
> >
> > Stats.mean() ?
> >
> > The last being nice and compact.
> >
> > Let's make a decision so our esteemed mentee Virendra knows in what
> > direction to take his work this summer. :)
>
I'm not sure I understand the implicit conclusions of this conversation
and the other one there:
https://markmail.org/message/7dmyhzuy6lublyb5
Do we agree that the core issue is *not* how to compute a mean, or a
median, or a fourth moment, but how any and all of those can be
computed seamlessly through a functional API (stream)?
As Alex pointed out, a useful functionality is the ability to "combine"
instances, e.g. if data are collected by several threads.
A potential use-case is the retrieval of the current value of (any)
statistical quantities while the data continues to be collected.
An initial idea would be:
public interface StatQuantity {
public double value(double[]); // For "basic" usage.
public double value(DoubleStream); // For "advanced" usage.
}
public class StatCollection {
/** Specify which quantities this collection will hold/compute. */
public StatCollection(Map<String, StatQuantity> stats) { /*... */ }
/**
* Start a worker thread.
* @param data Values for which the stat quantities must be computed.
*/
public void startCollector(DoubleStream data) { /* ... */ }
/** Combine current state of workers. */
public void collect() { /* ... */ }
/** @return the current (combined) value of a named quantity. */
public double get(String name) { /* ... */ }
private StatCollector implements Callable {
StatCollector(DoubleStream data) { /* ... */ }
}
}
This is all totally untested, very partial, and probably wrong-headed but
I thought that we were looking at this kind of refactoring.
Regards,
Gilles
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]