Hello. Le mar. 28 mai 2019 à 20:36, Alex Herbert <alex.d.herb...@gmail.com> a écrit : > > > > > On 28 May 2019, at 18:09, Eric Barnhill <ericbarnh...@gmail.com> wrote: > > > > The previous commons-math interface for descriptive statistics used a > > paradigm of constructing classes for various statistical functions and > > calling evaluate(). Example > > > > Mean mean = new Mean(); > > double mn = mean.evaluate(double[]) > > > > I wrote this type of code all through grad school and always found it > > unnecessarily bulky. To me these summary statistics are classic use cases > > for static methods: > > > > double mean .= Mean.evaluate(double[]) > > > > I don't have any particular problem with the evaluate() syntax. > > > > I looked over the old Math 4 API to see if there were any benefits to the > > previous class-oriented approach that we might not want to lose. But I > > don't think there were, the functionality outside of evaluate() is minimal. > > A quick check shows that evaluate comes from UnivariateStatistic. This has > some more methods that add little to an instance view of the computation: > > double evaluate(double[] values) throws MathIllegalArgumentException; > double evaluate(double[] values, int begin, int length) throws > MathIllegalArgumentException; > UnivariateStatistic copy(); > > However it is extended by StorelessUnivariateStatistic which adds methods to > update the statistic: > > void increment(double d); > void incrementAll(double[] values) throws MathIllegalArgumentException; > void incrementAll(double[] values, int start, int length) throws > MathIllegalArgumentException; > double getResult(); > long getN(); > void clear(); > StorelessUnivariateStatistic copy(); > > This type of functionality would be lost by static methods. > > If you are moving to a functional interface type pattern for each statistic > then you will lose the other functionality possible with an instance state, > namely updating with more values or combining instances. > > So this is a question of whether updating a statistic is required after the > first computation. > > Will there be an alternative in the library for a map-reduce type operation > using instances that can be combined using Stream.collect: > > <R> R collect(Supplier<R> supplier, > ObjDoubleConsumer<R> accumulator, > BiConsumer<R, R> combiner); > > Here <R> would be Mean: > > double mean = Arrays.stream(new double[1000]).collect(Mean::new, Mean::add, > Mean::add).getMean() with: > > void add(double); > void add(Mean); > double getMean(); > > (Untested code) > > > > > Finally we should consider whether we really need a separate class for each > > statistic at all. Do we want to call: > > > > Mean.evaluate() > > > > or > > > > SummaryStats.mean() > > > > or maybe > > > > Stats.mean() ? > > > > The last being nice and compact. > > > > Let's make a decision so our esteemed mentee Virendra knows in what > > direction to take his work this summer. :) >
I'm not sure I understand the implicit conclusions of this conversation and the other one there: https://markmail.org/message/7dmyhzuy6lublyb5 Do we agree that the core issue is *not* how to compute a mean, or a median, or a fourth moment, but how any and all of those can be computed seamlessly through a functional API (stream)? As Alex pointed out, a useful functionality is the ability to "combine" instances, e.g. if data are collected by several threads. A potential use-case is the retrieval of the current value of (any) statistical quantities while the data continues to be collected. An initial idea would be: public interface StatQuantity { public double value(double[]); // For "basic" usage. public double value(DoubleStream); // For "advanced" usage. } public class StatCollection { /** Specify which quantities this collection will hold/compute. */ public StatCollection(Map<String, StatQuantity> stats) { /*... */ } /** * Start a worker thread. * @param data Values for which the stat quantities must be computed. */ public void startCollector(DoubleStream data) { /* ... */ } /** Combine current state of workers. */ public void collect() { /* ... */ } /** @return the current (combined) value of a named quantity. */ public double get(String name) { /* ... */ } private StatCollector implements Callable { StatCollector(DoubleStream data) { /* ... */ } } } This is all totally untested, very partial, and probably wrong-headed but I thought that we were looking at this kind of refactoring. Regards, Gilles --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org