Hello Dr. Paul King,

I am working on the new regression module for Commons Statistics as a student 
in GSoC. I had a brief look at your Groovy Data Science (which I will have to 
look at more deeply in the future because it’s an interesting and high-quality 
tutorial/showcase), and noticed that in your slides you mentioned the 7 main 
types of regression. One of the central purposes of this new Commons Statistics 
Regression component is to design an architecture which can support these 
different types by allowing a good base for other developers to append more 
regression types beyond just OLS and GLS in math3.

Currently I’m trying to design for this purpose, using OLS as a starting base 
and EJML for matrix operations (instead of math3.linear). The plan is to have 
OLS, GLS and Logistic done by around end of August, and adding other regression 
types in the future, hopefully with other developers. 
The updating regressions like SimpleRegression you’ve used will likely stay as 
is for now unless you have suggestions for them?

I also wanted to take this opportunity to as you as a user:
1. What would make your life easier?
2. What features should definitely be kept?
a. Do you value the current data input interface (with just newSampleData() 
directly from OLS class)?
b. Or would you consider some of the others mentioned which is needed if using 
the same loaded data in different types of regression is important?
3. What features should be improved?
a. Would you consider the current running time sufficient or is it restrictive 
for you in any way? (hopefully EJML helped bit in that regard – perhaps 
benchmarks will be made after OLS is done)
4. Any suggestions/requests for specific features?
a. Perhaps a summary printout under a RegressionResults interface?

Thank you for your time, I appreciate any input you can give me.

Cheers,
-Ben Nguyen

From: Paul King
Sent: Friday, July 19, 2019 6:26 AM
To: Commons Developers List
Subject: Re: [statistics] Proposed OLS grammar

There are about 10 files using classes from the math3.stat package in
the examples I mentioned. I have stayed away from math4 while it's
still snapshot.

Repo: https://github.com/paulk-asert/groovy-data-science

Slides: https://speakerdeck.com/paulk/groovy-data-science

Most of the examples are in the subprojects/HousePrices project with a
few others just using StatUtil.

It's not my full-time day job to be using those classes but I'd be
keen to have those examples working nicely.

Cheers, Paul.

On Fri, Jul 19, 2019 at 9:11 PM Gilles Sadowski <gillese...@gmail.com> wrote:
>
> Hi.
>
> Your experience as a user of "Commons Math" would be most useful
> to help us craft a better (or, at least, no worse) design for "Commons
> Statistics".
> Would you share pointers to actual use-cases?
>
> Thanks,
> Gilles
>
> 2019-07-19 7:03 UTC+02:00, Paul King <paul.king.as...@gmail.com>:
> > Cool. I'd be keen to try out the API, when you are ready, in my
> > "Apache Groovy for data science" examples which currently use the
> > commons math3 classes.
> >
> > Cheers, Paul.
> >
> > On Fri, Jul 19, 2019 at 9:51 AM Gilles Sadowski <gillese...@gmail.com>
> > wrote:
> >>
> >> Hi.
> >>
> >> Le ven. 19 juil. 2019 à 01:45, Paul King <paul.king.as...@gmail.com> a
> >> écrit :
> >> >
> >> > How does this relate to the OLS classes in commons math?
> >> > https://commons.apache.org/proper/commons-math/javadocs/api-3.6.1/org/apache/commons/math3/stat/regression/OLSMultipleLinearRegression.html
> >>
> >> The new "Commons Statistics" component purports to replace the
> >> functionality
> >> currently defined in the package "org.apache.commons.math4.stat" of
> >> "Commons
> >> Math.
> >>
> >> Regards,
> >> Gilles
> >>
> >> > On Fri, Jul 19, 2019 at 8:50 AM Eric Barnhill <ericbarnh...@gmail.com>
> >> > wrote:
> >> > >
> >> > > I suggested the following grammar to aim for in our meeting today with
> >> > > the
> >> > > developing OLS module. If you see anything you'd prefer to change
> >> > > let's
> >> > > establish it now , if anyone doesn't like it later, it's on me.
> >> > >
> >> > > RegressionData data = RegressionDataLoader.of(double[][] y, double[]
> >> > > x);
> >> > > Regression ols = new OLSRegression();
> >> > > RegressionResults results = ols.regress(data);
> >> > > betas = results.getBetas() ;
> >> > >
> >> > > where:
> >> > > RegressionData is an interface
> >> > > RegressionDataLoader is a factory class and of() a (possibly
> >> > > overloaded)
> >> > > static method
> >> > > Regression is an interface, implemented by OLSRegression
> >> > > RegressionResults is an interface, the specific class returned is
> >> > > OLSResults which implements it.
> >> > > betas are the intercept and slopes of the regression model
> >> > >
> >> > > I think this preserves abstraction at the levels desired, since we
> >> > > will
> >> > > want in future flexibility as to regression type, posslble state
> >> > > parameters
> >> > > set on the regression object, and results contents and format. But
> >> > > also
> >> > > doesn't take on any unnecessary abstractions.
> >> > >
> >> > > Eric
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org


Reply via email to