Hello. [Please note that the first tag (in square bracket) on the "Subject" line should be the (short) name of the component, i.e. in this case, "[Statistics]".]
Le mar. 23 juil. 2019 à 20:31, Ben Nguyen <bennguye...@gmail.com> a écrit : > > Hello, > > Issue was noticed as a PMD violation during PR’s Travis CI build. > “The class 'OLSResults' is suspected to be a Data Class (WOC=20.000%, NOPA=0, > NOAM=4, WMC=6)” > https://github.com/apache/commons-statistics/pull/23 > > UML for current design: > https://github.com/BBenNguyenn/commons-statistics/blob/gsoc-milestone-1/commons-statistics-regression/UML_current.png > > The current design is my (probably wrong) interpretation on a suggestion from > Eric Barnhill. > The idea is that all calculation functionality is done in the OLSRegression > class, which passes itself to the OLSResults constructor via regress() method. Better directly show the relevant code: public class OLSRegression extends AbstractRegression { // ... public RegressionResults regress() { return new OLSResults(this); } } public class OLSResults implements RegressionResults, Serializable { protected OLSResults(OLSRegression ols) { this.beta = ols.estimateBeta(); this.betaVariance = ols.estimateBetaVariance(); this.betaStandardErrors = ols.estimateBetaStandardErrors(); this.residuals = ols.estimateResiduals(); this.regressionStandardErrors = ols.estimateRegressionStandardError(); this.regressandVariance = ols.estimateRegressandVariance(); this.hatMatrix = ols.calculateHat().toArray2D(); this.totalSumOfSquares = ols.calculateTotalSumOfSquares(); this.residualSumOfSquares = ols.calculateResidualSumOfSquares(); this.regRSquared = ols.calculateRSquared(); this.adjustedRSquared = ols.calculateAdjustedRSquared(); } } Why is the constructor "protected"? Why is the computation performed in that constructor although the data is stored in "OLSRegression"? [This looks like what PMD complains about.] > The OLSResults (implementing RegressionResults interface) calls on statistics > calculating methods from OLSRegression and saves it as private fields with > getters. It's good to try and have encapsulation preserved through accessors but in that case it's broken because some return a reference to an array (entries are mutable). Implementing an interface "RegressionResults" suggest that application should program against it, and not the specific "OLSResult"; however the latter has more methods than the interface defines. > I am thinking this is good for users who want everything to be calculated > only once to then be used multiple times (without the recalculation or having > to save all statistics themselves). What is the most useful functionality? What is the most robust functionality? IOW: Is there a case for not computing everything? These are two of the methods called in the constructor above: public double calculateRSquared() { return 1 - calculateResidualSumOfSquares() / calculateTotalSumOfSquares(); } public double calculateAdjustedRSquared() { final double n = getX().numRows(); if (getHasIntercept()) { return 1 - (1 - calculateRSquared()) * (n / (n - getX().numCols())); } else { return 1 - (calculateResidualSumOfSquares() * (n - 1)) / (calculateTotalSumOfSquares() * (n - getX().numCols())); } } There are many redundant computations... > The RegressionResults interface could also include some kind of output method > to get all statistics in a file or displayed? Why should it be done within the class if it provides accessors to all its contents? > Then, for those who only want a single calculated statistic, the methods to > calculate in OLSRegression are also public. Then even assuming that you fix somehow "OLSResults" to compute each result only once, a user who directly calls the above two methods will get subpar performance. > This design does cause that PMD error, so any suggestions or alternate > designs to correct my interpretation? There are definite problems with your implementation that need fixing; but it's not necessarily proof of a bad design. However it's clearly not the most obvious one, which IMO would go along the lines RegressionData data = RegressionDataLoader.from(x, y, hasIntercept); Regression reg = new OLSRegression(); RegressionResults res = reg.regress(data); This is all untested of course, and could well fall short for advanced usage (stream?). Your design (or yet another) could be an attempt to handle particular use-cases which the obvious approach does not cover (?) ... Regards, Gilles > > Thank you, > Cheers, > -Ben Nguyen > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org