Okay, my bad. I thought I had looked through the list of enums, but I obviously was not as thorough as I should have been. Will make the changes that you suggest.
Thank you, -Greg On Fri, Sep 23, 2011 at 5:51 AM, Gilles Sadowski < gil...@harfang.homelinux.org> wrote: > Hello. > > > Author: gregs > > Date: Fri Sep 23 03:36:11 2011 > > New Revision: 1174509 > > > > URL: http://svn.apache.org/viewvc?rev=1174509&view=rev > > Log: > > JIRA: MATH-607 Adding support for UpdatingMultipleLinearRegression to > SimpleRegression > > > > Modified: > > > commons/proper/math/trunk/src/main/java/org/apache/commons/math/exception/util/LocalizedFormats.java > > Part of the rationale for implementing exception objects was to reduce the > number of overlapping or redundant messages in the "LocalizedFormats" enum. > > As I had indicated in the numerous discussions related to this subject, the > sheer size of that list of messages induces one to add more. This has just > happened again. > > > > > Modified: > commons/proper/math/trunk/src/main/java/org/apache/commons/math/exception/util/LocalizedFormats.java > > URL: > http://svn.apache.org/viewvc/commons/proper/math/trunk/src/main/java/org/apache/commons/math/exception/util/LocalizedFormats.java?rev=1174509&r1=1174508&r2=1174509&view=diff > > > ============================================================================== > > --- > commons/proper/math/trunk/src/main/java/org/apache/commons/math/exception/util/LocalizedFormats.java > (original) > > +++ > commons/proper/math/trunk/src/main/java/org/apache/commons/math/exception/util/LocalizedFormats.java > Fri Sep 23 03:36:11 2011 > > @@ -42,6 +42,7 @@ public enum LocalizedFormats implements > > // CHECKSTYLE: stop JavadocVariable > > > > ARGUMENT_OUTSIDE_DOMAIN("Argument {0} outside domain [{1} ; {2}]"), > > + ARRAY_SIZE_EXCEEDS_MAX_VARIABLES("array size cannot be greater than > {0}"), > > With the new "ExceptionContext" functionality, one can add specialized > messages > to an exception. > This approach could be taken to combine > INPUT_ARRAY > LENGTH > NUMBER_TOO_LARGE > in order to convey all the information about the failure. > > > ARRAY_SIZES_SHOULD_HAVE_DIFFERENCE_1("array sizes should have > difference 1 ({0} != {1} + 1)"), > > ARRAY_SUMS_TO_ZERO("array sums to zero"), > > ASSYMETRIC_EIGEN_NOT_SUPPORTED("eigen decomposition of assymetric > matrices not supported yet"), > > @@ -135,6 +136,7 @@ public enum LocalizedFormats implements > > INVALID_INTERVAL_INITIAL_VALUE_PARAMETERS("invalid interval, initial > value parameters: lower={0}, initial={1}, upper={2}"), > > INVALID_ITERATIONS_LIMITS("invalid iteration limits: min={0}, > max={1}"), > > INVALID_MAX_ITERATIONS("bad value for maximum iterations number: > {0}"), > > + NOT_ENOUGH_DATA_REGRESSION("the number of observations is not > sufficient to conduct regression"), > > There already exists a > NO_DATA > message. Instead of piling up messages with "REGRESSION", we can add a > REGRESSION > and combine with the above to convey the same meaning. > Then, this new building block can be reused with others in order to compose > all sorts of messages about REGRESSION without lengthening the list. > > > > INVALID_REGRESSION_ARRAY("input data array length = {0} does not > match the number of observations = {1} and the number of regressors = {2}"), > > INVALID_REGRESSION_OBSERVATION("length of regressor array = {0} does > not match the number of variables = {1} in the model"), > > INVALID_ROUNDING_METHOD("invalid rounding method {0}, valid methods: > {1} ({2}), {3} ({4}), {5} ({6}), {7} ({8}), {9} ({10}), {11} ({12}), {13} > ({14}), {15} ({16})"), > > @@ -239,6 +241,7 @@ public enum LocalizedFormats implements > > NO_RESULT_AVAILABLE("no result available"), > > NO_SUCH_MATRIX_ENTRY("no entry at indices ({0}, {1}) in a {2}x{3} > matrix"), > > NULL_NOT_ALLOWED("null is not allowed"), /* keep */ > > + ARRAY_ZERO_LENGTH_OR_NULL_NOTALLOWED("A null or zero length array > not allowed"), > > This is downright redundant with > NO_DATA > NULL_NOT_ALLOWED > Moreover exceptions already exist for those situations: "NoDataException" > and "NullArgumentException". They should be use instead of their base > class. > > > COVARIANCE_MATRIX("covariance matrix"), /* keep */ > > DENOMINATOR("denominator"), /* keep */ > > DENOMINATOR_FORMAT("denominator format"), /* keep */ > > > > Modified: > commons/proper/math/trunk/src/main/java/org/apache/commons/math/stat/regression/RegressionResults.java > > URL: > http://svn.apache.org/viewvc/commons/proper/math/trunk/src/main/java/org/apache/commons/math/stat/regression/RegressionResults.java?rev=1174509&r1=1174508&r2=1174509&view=diff > > > ============================================================================== > > --- > commons/proper/math/trunk/src/main/java/org/apache/commons/math/stat/regression/RegressionResults.java > (original) > > +++ > commons/proper/math/trunk/src/main/java/org/apache/commons/math/stat/regression/RegressionResults.java > Fri Sep 23 03:36:11 2011 > > @@ -114,7 +114,7 @@ public class RegressionResults implement > > this.globalFitInfo = new double[5]; > > Arrays.fill(this.globalFitInfo, Double.NaN); > > > > - if (rank > 2) { > > + if (rank > 0) { > > this.globalFitInfo[SST_IDX] = containsConstant ? > > (sumysq - sumy * sumy / ((double) nobs)) : sumysq; > > } > > > > Modified: > commons/proper/math/trunk/src/main/java/org/apache/commons/math/stat/regression/SimpleRegression.java > > URL: > http://svn.apache.org/viewvc/commons/proper/math/trunk/src/main/java/org/apache/commons/math/stat/regression/SimpleRegression.java?rev=1174509&r1=1174508&r2=1174509&view=diff > > > ============================================================================== > > --- > commons/proper/math/trunk/src/main/java/org/apache/commons/math/stat/regression/SimpleRegression.java > (original) > > +++ > commons/proper/math/trunk/src/main/java/org/apache/commons/math/stat/regression/SimpleRegression.java > Fri Sep 23 03:36:11 2011 > > @@ -22,8 +22,11 @@ import org.apache.commons.math.MathExcep > > import org.apache.commons.math.exception.OutOfRangeException; > > import org.apache.commons.math.distribution.TDistribution; > > import org.apache.commons.math.distribution.TDistributionImpl; > > +import org.apache.commons.math.exception.MathIllegalArgumentException; > > +import org.apache.commons.math.exception.NoDataException; > > import org.apache.commons.math.exception.util.LocalizedFormats; > > import org.apache.commons.math.util.FastMath; > > +import org.apache.commons.math.util.MathUtils; > > > > /** > > * Estimates an ordinary least squares regression model > > @@ -55,7 +58,7 @@ import org.apache.commons.math.util.Fast > > * > > * @version $Id$ > > */ > > -public class SimpleRegression implements Serializable { > > +public class SimpleRegression implements Serializable, > UpdatingMultipleLinearRegression { > > > > /** Serializable version identifier */ > > private static final long serialVersionUID = -3004689053607543335L; > > @@ -98,7 +101,7 @@ public class SimpleRegression implements > > * Secondary constructor which allows the user the ability to > include/exclude const > > * @param includeIntercept boolean flag, true includes an intercept > > */ > > - public SimpleRegression(boolean includeIntercept){ > > + public SimpleRegression(boolean includeIntercept) { > > super(); > > hasIntercept = includeIntercept; > > } > > @@ -116,7 +119,7 @@ public class SimpleRegression implements > > * @param x independent variable value > > * @param y dependent variable value > > */ > > - public void addData(final double x, final double y){ > > + public void addData(final double x,final double y) { > > if (n == 0) { > > xbar = x; > > ybar = y; > > @@ -158,7 +161,7 @@ public class SimpleRegression implements > > * @param x independent variable value > > * @param y dependent variable value > > */ > > - public void removeData(double x, double y) { > > + public void removeData(final double x,final double y) { > > if (n > 0) { > > if (hasIntercept) { > > final double fact1 = (double) n - 1.0; > > @@ -200,13 +203,69 @@ public class SimpleRegression implements > > * data.</p> > > * > > * @param data array of observations to be added > > + * @throws ModelSpecificationException if the length of {@code > data[i]} is not > > + * greater than or equal to 2 > > */ > > - public void addData(double[][] data) { > > + public void addData(final double[][] data) { > > for (int i = 0; i < data.length; i++) { > > + if( data[i].length < 2 ){ > > + throw new > ModelSpecificationException(LocalizedFormats.INVALID_REGRESSION_OBSERVATION, > > + data[i].length, 2); > > + } > > This can be conveyed using > DIMENSIONS_MISMATCH_SIMPLE > DIMENSIONS_MISMATCH > REGRESSION (to be added) > > > addData(data[i][0], data[i][1]); > > } > > + return; > > + } > > + > > + /** > > + * Adds one observation to the regression model. > > + * > > + * @param x the independent variables which form the design matrix > > + * @param y the dependent or response variable > > + * @throws ModelSpecificationException if the length of {@code x} > does not equal > > + * the number of independent variables in the model > > + */ > > + public void addObservation(final double[] x,final double y) throws > ModelSpecificationException{ > > + if( x == null || x.length == 0 ){ > > + throw new > ModelSpecificationException(LocalizedFormats.INVALID_REGRESSION_OBSERVATION,x!=null?x.length:0, > 1); > > + } > > + addData( x[0], y ); > > + return; > > } > > > > + /** > > + * Adds a series of observations to the regression model. The > lengths of > > + * x and y must be the same and x must be rectangular. > > + * > > + * @param x a series of observations on the independent variables > > + * @param y a series of observations on the dependent variable > > + * The length of x and y must be the same > > + * @throws ModelSpecificationException if {@code x} is not > rectangular, does not match > > + * the length of {@code y} or does not contain sufficient data to > estimate the model > > + */ > > + public void addObservations(final double[][] x,final double[] y) { > > + if ((x == null) || (y == null) || (x.length != y.length)) { > > + throw new ModelSpecificationException( > > + LocalizedFormats.DIMENSIONS_MISMATCH_SIMPLE, > > + (x == null) ? 0 : x.length, > > + (y == null) ? 0 : y.length); > > + } > > It is wrong to clump tests for null and for array size. > The message could even be _wrong_ since you report a null as having size 0! > > For null, we should throw "NullArgumentException".[1] > For zero size, "NoDataException". > > > + boolean obsOk=true; > > + for( int i = 0 ; i < x.length; i++){ > > + if( x[i] == null || x[i].length == 0 ){ > > + obsOk = false; > > + } > > + } > > + if( !obsOk ){ > > + throw new ModelSpecificationException( > > + > LocalizedFormats.NOT_ENOUGH_DATA_FOR_NUMBER_OF_PREDICTORS, > > + 0, 1); > > + } > > REGRESSION > NO_DATA > NUMBER_IS_TOO_SMALL > > > + for( int i = 0 ; i < x.length ; i++){ > > + addData( x[i][0], y[i] ); > > + } > > + return; > > + } > > > > /** > > * Removes observations represented by the elements in > <code>data</code>. > > @@ -265,8 +324,8 @@ public class SimpleRegression implements > > * @param x input <code>x</code> value > > * @return predicted <code>y</code> value > > */ > > - public double predict(double x) { > > - double b1 = getSlope(); > > + public double predict(final double x) { > > + final double b1 = getSlope(); > > if (hasIntercept) { > > return getIntercept(b1) + b1 * x; > > } > > @@ -298,7 +357,7 @@ public class SimpleRegression implements > > * > > * @return true if constant exists, false otherwise > > */ > > - public boolean hasIntercept(){ > > + public boolean hasIntercept() { > > return hasIntercept; > > } > > > > @@ -572,7 +631,7 @@ public class SimpleRegression implements > > * @return half-width of 95% confidence interval for the slope > estimate > > * @throws MathException if the confidence interval can not be > computed. > > */ > > - public double getSlopeConfidenceInterval(double alpha) > > + public double getSlopeConfidenceInterval(final double alpha) > > throws MathException { > > if (alpha >= 1 || alpha <= 0) { > > throw new > OutOfRangeException(LocalizedFormats.SIGNIFICANCE_LEVEL, > > @@ -620,7 +679,7 @@ public class SimpleRegression implements > > * @param slope current slope > > * @return the intercept of the regression line > > */ > > - private double getIntercept(double slope){ > > + private double getIntercept(final double slope) { > > if( hasIntercept){ > > return (sumY - slope * sumX) / n; > > } > > @@ -633,7 +692,134 @@ public class SimpleRegression implements > > * @param slope regression slope estimate > > * @return sum of squared deviations of predicted y values > > */ > > - private double getRegressionSumSquares(double slope) { > > + private double getRegressionSumSquares(final double slope) { > > return slope * slope * sumXX; > > } > > + > > + /** > > + * Performs a regression on data present in buffers and outputs a > RegressionResults object > > + * @return RegressionResults acts as a container of regression > output > > + * @throws ModelSpecificationException if the model is not correctly > specified > > + */ > > + public RegressionResults regress() throws > ModelSpecificationException{ > > + if( hasIntercept ){ > > + if( n < 3 ){ > > + throw new NoDataException( > LocalizedFormats.NOT_ENOUGH_DATA_REGRESSION ); > > + } > > This is slightly misleading ("no data" or "not enough data"?). > > > + if( FastMath.abs( sumXX ) > MathUtils.SAFE_MIN ){ > > + final double[] params = new double[]{ getIntercept(), > getSlope() }; > > + final double mse = getMeanSquareError(); > > + final double _syy = sumYY + sumY * sumY / ((double) n); > > + final double[] vcv = new double[]{ > > + mse * (xbar *xbar /sumXX + 1.0 / ((double) n)), > > + -xbar*mse/sumXX, > > + mse/sumXX }; > > + return new RegressionResults( > > + params, new double[][]{vcv}, true, n, 2, > > + sumY, _syy, getSumSquaredErrors(),true,false); > > + }else{ > > + final double[] params = new double[]{ sumY/((double) n), > Double.NaN }; > > + //final double mse = getMeanSquareError(); > > + final double[] vcv = new double[]{ > > + ybar / ((double) n - 1.0), > > + Double.NaN, > > + Double.NaN }; > > + return new RegressionResults( > > + params, new double[][]{vcv}, true, n, 1, > > + sumY, sumYY, getSumSquaredErrors(),true,false); > > + } > > + }else{ > > + if( n < 2 ){ > > + throw new NoDataException( > LocalizedFormats.NOT_ENOUGH_DATA_REGRESSION ); > > + } > > Same as above. > > > + if( !Double.isNaN(sumXX) ){ > > + final double[] vcv = new double[]{ getMeanSquareError() / > sumXX }; > > + final double[] params = new double[]{ sumXY/sumXX }; > > + return new RegressionResults( > > + params, new double[][]{vcv}, true, n, 1, > > + sumY, sumYY, getSumSquaredErrors(),false,false); > > + }else{ > > + final double[] vcv = new double[]{Double.NaN }; > > + final double[] params = new double[]{ Double.NaN }; > > + return new RegressionResults( > > + params, new double[][]{vcv}, true, n, 1, > > + Double.NaN, Double.NaN, Double.NaN,false,false); > > + } > > + } > > + } > > + > > + /** > > + * Performs a regression on data present in buffers including only > regressors > > + * indexed in variablesToInclude and outputs a RegressionResults > object > > + * @param variablesToInclude an array of indices of regressors to > include > > + * @return RegressionResults acts as a container of regression > output > > + * @throws ModelSpecificationException if the model is not correctly > specified > > + * @throws MathIllegalArgumentException if the variablesToInclude > array is null or zero length > > + * @throws OutOfRangeException if a requested variable is not > present in model > > + */ > > + public RegressionResults regress(int[] variablesToInclude) throws > ModelSpecificationException{ > > + if( variablesToInclude == null || variablesToInclude.length == > 0){ > > + throw new > MathIllegalArgumentException(LocalizedFormats.ARRAY_ZERO_LENGTH_OR_NULL_NOTALLOWED); > > + } > > Clumped tests. > > > + if( variablesToInclude.length > 2 || (variablesToInclude.length > > 1 && !hasIntercept) ){ > > + throw new ModelSpecificationException( > > + LocalizedFormats.ARRAY_SIZE_EXCEEDS_MAX_VARIABLES, > > + (variablesToInclude.length > 1 && !hasIntercept) ? 1 > : 2); > > + } > > REGRESSION > NUMBER_IS_TOO_LARGE > > > + if( hasIntercept ){ > > + if( variablesToInclude.length == 2 ){ > > + if( variablesToInclude[0] == 1 ){ > > + throw new > ModelSpecificationException(LocalizedFormats.NOT_INCREASING_SEQUENCE); > > This one should throw "NonMonotonicSequenceException". > I really don't see the value of clumping completely different failure into > the same "ModelSpecificationException". This renders the exception type > useless. > [The point of having typed exceptionz is to provide the flexibility of > dealing with them programmatically. When the actual causes of failure > varies > wildly, this won't be possible.] > > > + }else if( variablesToInclude[0] != 0 ){ > > + throw new OutOfRangeException( > variablesToInclude[0], 0,1 ); > > + } > > + if( variablesToInclude[1] != 1){ > > + throw new OutOfRangeException( > variablesToInclude[0], 0,1 ); > > + } > > + return regress(); > > + }else{ > > + if( variablesToInclude[0] != 1 && variablesToInclude[0] > != 0 ){ > > + throw new OutOfRangeException( > variablesToInclude[0],0,1 ); > > + } > > + final double _mean = sumY * sumY / ((double) n); > > + final double _syy = sumYY + _mean; > > + if( variablesToInclude[0] == 0 ){ > > + //just the mean > > + final double[] vcv = new double[]{ > sumYY/((double)((n-1)*n)) }; > > + final double[] params = new double[]{ ybar }; > > + return new RegressionResults( > > + params, new double[][]{vcv}, true, n, 1, > > + sumY, _syy+_mean, sumYY,true,false); > > + > > + }else if( variablesToInclude[0] == 1){ > > + //final double _syy = sumYY + sumY * sumY / > ((double) n); > > + final double _sxx = sumXX + sumX * sumX / ((double) > n); > > + final double _sxy = sumXY + sumX * sumY / ((double) > n); > > + final double _sse = FastMath.max(0d, _syy - _sxy * > _sxy / _sxx); > > + final double _mse = _sse/((double)(n-1)); > > + if( !Double.isNaN(_sxx) ){ > > + final double[] vcv = new double[]{ _mse / _sxx > }; > > + final double[] params = new double[]{ _sxy/_sxx > }; > > + return new RegressionResults( > > + params, new double[][]{vcv}, true, > n, 1, > > + sumY, _syy, _sse,false,false); > > + }else{ > > + final double[] vcv = new double[]{Double.NaN }; > > + final double[] params = new double[]{ Double.NaN > }; > > + return new RegressionResults( > > + params, new double[][]{vcv}, true, > n, 1, > > + Double.NaN, Double.NaN, > Double.NaN,false,false); > > + } > > + } > > + } > > + }else{ > > + if( variablesToInclude[0] != 0 ){ > > + throw new > OutOfRangeException(variablesToInclude[0],0,0); > > + } > > + return regress(); > > + } > > + > > + return null; > > + } > > } > > > > Modified: > commons/proper/math/trunk/src/main/java/org/apache/commons/math/stat/regression/UpdatingMultipleLinearRegression.java > > URL: > http://svn.apache.org/viewvc/commons/proper/math/trunk/src/main/java/org/apache/commons/math/stat/regression/UpdatingMultipleLinearRegression.java?rev=1174509&r1=1174508&r2=1174509&view=diff > > > ============================================================================== > > --- > commons/proper/math/trunk/src/main/java/org/apache/commons/math/stat/regression/UpdatingMultipleLinearRegression.java > (original) > > +++ > commons/proper/math/trunk/src/main/java/org/apache/commons/math/stat/regression/UpdatingMultipleLinearRegression.java > Fri Sep 23 03:36:11 2011 > > @@ -61,7 +61,7 @@ public interface UpdatingMultipleLinearR > > * @throws ModelSpecificationException if {@code x} is not > rectangular, does not match > > * the length of {@code y} or does not contain sufficient data to > estimate the model > > */ > > - void addObservations(double[][] x, double[] y); > > + void addObservations(double[][] x, double[] y) throws > ModelSpecificationException; > > This is exactly what I say just above: a single exception for totally > unrelated failures. > [And it is yet another example of mixing (i.e. confusing) library and > application concerns: The library must check the preconditions that are > needed to perform its work, and throw an exception that it commensurate > with > the failed test. Then, the application layer (the caller) can possibly > translate this into a "business" language. At the level of CM, we lack the > full context in order to be sure that one fixed high-level message wil be > adequate in all circumstances. As I've written numerous times also: When > designing code for CM, we must strive to separate the roles of "CM user" > and > "CM developer".] > > > /** > > * Clears internal buffers and resets the regression model. This > means all > > > > [...] > > Also, please be careful of the code formatting style. This commit contains > many violations of the following rules: > * There should be a space character before a "{" character. > * There should be a space character on both sides of a keyword. > * There should be a space on both sides of an operator. > * There should not be a space after an opening parenthesis. > * There should not be a space before a closing parenthesis. > * There should not be a space before a ";" character. > * Variables (instance or local) should not contain a "_" character. > * Indents should be (muliples of) 4 space characters wide. > > > Thanks and best regards, > Gilles > > [1] There is an unsettled issue about whether the CM code should check for > "null" (and throw "NullArgumentException") or let the JVM throw the > standard NPE. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > >