Hello.

[Please note that the first tag (in square bracket) on the "Subject" line should
be the (short) name of the component, i.e. in this case, "[Statistics]".]

Le mar. 23 juil. 2019 à 20:31, Ben Nguyen <bennguye...@gmail.com> a écrit :
>
> Hello,
>
> Issue was noticed as a PMD violation during PR’s Travis CI build.
> “The class 'OLSResults' is suspected to be a Data Class (WOC=20.000%, NOPA=0, 
> NOAM=4, WMC=6)”
> https://github.com/apache/commons-statistics/pull/23
>
> UML for current design:
> https://github.com/BBenNguyenn/commons-statistics/blob/gsoc-milestone-1/commons-statistics-regression/UML_current.png
>
> The current design is my (probably wrong) interpretation on a suggestion from 
> Eric Barnhill.
> The idea is that all calculation functionality is done in the OLSRegression 
> class, which passes itself to the OLSResults constructor via regress() method.

Better directly show the relevant code:

public class OLSRegression extends AbstractRegression {
    // ...

    public RegressionResults regress() {
        return new OLSResults(this);
    }
}

public class OLSResults implements RegressionResults, Serializable {
    protected OLSResults(OLSRegression ols) {
        this.beta = ols.estimateBeta();
        this.betaVariance = ols.estimateBetaVariance();
        this.betaStandardErrors = ols.estimateBetaStandardErrors();
        this.residuals = ols.estimateResiduals();
        this.regressionStandardErrors = ols.estimateRegressionStandardError();
        this.regressandVariance = ols.estimateRegressandVariance();

        this.hatMatrix = ols.calculateHat().toArray2D();
        this.totalSumOfSquares = ols.calculateTotalSumOfSquares();
        this.residualSumOfSquares = ols.calculateResidualSumOfSquares();
        this.regRSquared = ols.calculateRSquared();
        this.adjustedRSquared = ols.calculateAdjustedRSquared();
    }
}

Why is the constructor "protected"?
Why is the computation performed in that constructor although the data is
stored in "OLSRegression"? [This looks like what PMD complains about.]

> The OLSResults (implementing RegressionResults interface) calls on statistics 
> calculating methods from OLSRegression and saves it as private fields with 
> getters.

It's good to try and have encapsulation preserved through accessors but in
that case it's broken because some return a reference to an array (entries
are mutable).

Implementing an interface "RegressionResults" suggest that application
should program against it, and not the specific "OLSResult"; however the
latter has more methods than the interface defines.

> I am thinking this is good for users who want everything to be calculated 
> only once to then be used multiple times (without the recalculation or having 
> to save all statistics themselves).

What is the most useful functionality?
What is the most robust functionality?
IOW:  Is there a case for not computing everything?

These are two of the methods called in the constructor above:

    public double calculateRSquared() {
        return 1 - calculateResidualSumOfSquares() /
calculateTotalSumOfSquares();
    }

    public double calculateAdjustedRSquared() {
        final double n = getX().numRows();
        if (getHasIntercept()) {
            return 1 - (1 - calculateRSquared()) * (n / (n - getX().numCols()));
        } else {
            return 1 -
                   (calculateResidualSumOfSquares() * (n - 1)) /
                       (calculateTotalSumOfSquares() * (n - getX().numCols()));
        }
    }

There are many redundant computations...

> The RegressionResults interface could also include some kind of output method 
> to get all statistics in a file or displayed?

Why should it be done within the class if it provides accessors to all
its contents?

> Then, for those who only want a single calculated statistic, the methods to 
> calculate in OLSRegression are also public.

Then even assuming that you fix somehow "OLSResults" to compute
each result only once, a user who directly calls the above two methods
will get subpar performance.

> This design does cause that PMD error, so any suggestions or alternate 
> designs to correct my interpretation?

There are definite problems with your implementation that need fixing;
but it's not necessarily proof of a bad design.
However it's clearly not the most obvious one, which IMO would go along
the lines

RegressionData data = RegressionDataLoader.from(x, y, hasIntercept);
Regression reg = new OLSRegression();
RegressionResults res = reg.regress(data);

This is all untested of course, and could well fall short for advanced
usage (stream?).  Your design (or yet another) could be an attempt
to handle particular use-cases which the obvious approach does not
cover (?) ...

Regards,
Gilles

>
> Thank you,
> Cheers,
> -Ben Nguyen
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to