Okay, my bad. I thought I had looked through the list of enums, but I
obviously was not as thorough as I should have been. Will make the changes
that you suggest.

Thank you,

-Greg

On Fri, Sep 23, 2011 at 5:51 AM, Gilles Sadowski <
gil...@harfang.homelinux.org> wrote:

> Hello.
>
> > Author: gregs
> > Date: Fri Sep 23 03:36:11 2011
> > New Revision: 1174509
> >
> > URL: http://svn.apache.org/viewvc?rev=1174509&view=rev
> > Log:
> > JIRA: MATH-607 Adding support for UpdatingMultipleLinearRegression to
> SimpleRegression
> >
> > Modified:
> >
> commons/proper/math/trunk/src/main/java/org/apache/commons/math/exception/util/LocalizedFormats.java
>
> Part of the rationale for implementing exception objects was to reduce the
> number of overlapping or redundant messages in the "LocalizedFormats" enum.
>
> As I had indicated in the numerous discussions related to this subject, the
> sheer size of that list of messages induces one to add more. This has just
> happened again.
>
> >
> > Modified:
> commons/proper/math/trunk/src/main/java/org/apache/commons/math/exception/util/LocalizedFormats.java
> > URL:
> http://svn.apache.org/viewvc/commons/proper/math/trunk/src/main/java/org/apache/commons/math/exception/util/LocalizedFormats.java?rev=1174509&r1=1174508&r2=1174509&view=diff
> >
> ==============================================================================
> > ---
> commons/proper/math/trunk/src/main/java/org/apache/commons/math/exception/util/LocalizedFormats.java
> (original)
> > +++
> commons/proper/math/trunk/src/main/java/org/apache/commons/math/exception/util/LocalizedFormats.java
> Fri Sep 23 03:36:11 2011
> > @@ -42,6 +42,7 @@ public enum LocalizedFormats implements
> >      // CHECKSTYLE: stop JavadocVariable
> >
> >      ARGUMENT_OUTSIDE_DOMAIN("Argument {0} outside domain [{1} ; {2}]"),
> > +    ARRAY_SIZE_EXCEEDS_MAX_VARIABLES("array size cannot be greater than
> {0}"),
>
> With the new "ExceptionContext" functionality, one can add specialized
> messages
> to an exception.
> This approach could be taken to combine
>  INPUT_ARRAY
>  LENGTH
>  NUMBER_TOO_LARGE
> in order to convey all the information about the failure.
>
> >      ARRAY_SIZES_SHOULD_HAVE_DIFFERENCE_1("array sizes should have
> difference 1 ({0} != {1} + 1)"),
> >      ARRAY_SUMS_TO_ZERO("array sums to zero"),
> >      ASSYMETRIC_EIGEN_NOT_SUPPORTED("eigen decomposition of assymetric
> matrices not supported yet"),
> > @@ -135,6 +136,7 @@ public enum LocalizedFormats implements
> >      INVALID_INTERVAL_INITIAL_VALUE_PARAMETERS("invalid interval, initial
> value parameters:  lower={0}, initial={1}, upper={2}"),
> >      INVALID_ITERATIONS_LIMITS("invalid iteration limits: min={0},
> max={1}"),
> >      INVALID_MAX_ITERATIONS("bad value for maximum iterations number:
> {0}"),
> > +    NOT_ENOUGH_DATA_REGRESSION("the number of observations is not
> sufficient to conduct regression"),
>
> There already exists a
>  NO_DATA
> message. Instead of piling up messages with "REGRESSION", we can add a
>  REGRESSION
> and combine with the above to convey the same meaning.
> Then, this new building block can be reused with others in order to compose
> all sorts of messages about REGRESSION without lengthening the list.
>
>
> >      INVALID_REGRESSION_ARRAY("input data array length = {0} does not
> match the number of observations = {1} and the number of regressors = {2}"),
> >      INVALID_REGRESSION_OBSERVATION("length of regressor array = {0} does
> not match the number of variables = {1} in the model"),
> >      INVALID_ROUNDING_METHOD("invalid rounding method {0}, valid methods:
> {1} ({2}), {3} ({4}), {5} ({6}), {7} ({8}), {9} ({10}), {11} ({12}), {13}
> ({14}), {15} ({16})"),
> > @@ -239,6 +241,7 @@ public enum LocalizedFormats implements
> >      NO_RESULT_AVAILABLE("no result available"),
> >      NO_SUCH_MATRIX_ENTRY("no entry at indices ({0}, {1}) in a {2}x{3}
> matrix"),
> >      NULL_NOT_ALLOWED("null is not allowed"), /* keep */
> > +    ARRAY_ZERO_LENGTH_OR_NULL_NOTALLOWED("A null or zero length array
> not allowed"),
>
> This is downright redundant with
>  NO_DATA
>  NULL_NOT_ALLOWED
> Moreover exceptions already exist for those situations: "NoDataException"
> and "NullArgumentException". They should be use instead of their base
> class.
>
> >      COVARIANCE_MATRIX("covariance matrix"), /* keep */
> >      DENOMINATOR("denominator"), /* keep */
> >      DENOMINATOR_FORMAT("denominator format"), /* keep */
> >
> > Modified:
> commons/proper/math/trunk/src/main/java/org/apache/commons/math/stat/regression/RegressionResults.java
> > URL:
> http://svn.apache.org/viewvc/commons/proper/math/trunk/src/main/java/org/apache/commons/math/stat/regression/RegressionResults.java?rev=1174509&r1=1174508&r2=1174509&view=diff
> >
> ==============================================================================
> > ---
> commons/proper/math/trunk/src/main/java/org/apache/commons/math/stat/regression/RegressionResults.java
> (original)
> > +++
> commons/proper/math/trunk/src/main/java/org/apache/commons/math/stat/regression/RegressionResults.java
> Fri Sep 23 03:36:11 2011
> > @@ -114,7 +114,7 @@ public class RegressionResults implement
> >          this.globalFitInfo = new double[5];
> >          Arrays.fill(this.globalFitInfo, Double.NaN);
> >
> > -        if (rank > 2) {
> > +        if (rank > 0) {
> >              this.globalFitInfo[SST_IDX] = containsConstant ?
> >                      (sumysq - sumy * sumy / ((double) nobs)) : sumysq;
> >          }
> >
> > Modified:
> commons/proper/math/trunk/src/main/java/org/apache/commons/math/stat/regression/SimpleRegression.java
> > URL:
> http://svn.apache.org/viewvc/commons/proper/math/trunk/src/main/java/org/apache/commons/math/stat/regression/SimpleRegression.java?rev=1174509&r1=1174508&r2=1174509&view=diff
> >
> ==============================================================================
> > ---
> commons/proper/math/trunk/src/main/java/org/apache/commons/math/stat/regression/SimpleRegression.java
> (original)
> > +++
> commons/proper/math/trunk/src/main/java/org/apache/commons/math/stat/regression/SimpleRegression.java
> Fri Sep 23 03:36:11 2011
> > @@ -22,8 +22,11 @@ import org.apache.commons.math.MathExcep
> >  import org.apache.commons.math.exception.OutOfRangeException;
> >  import org.apache.commons.math.distribution.TDistribution;
> >  import org.apache.commons.math.distribution.TDistributionImpl;
> > +import org.apache.commons.math.exception.MathIllegalArgumentException;
> > +import org.apache.commons.math.exception.NoDataException;
> >  import org.apache.commons.math.exception.util.LocalizedFormats;
> >  import org.apache.commons.math.util.FastMath;
> > +import org.apache.commons.math.util.MathUtils;
> >
> >  /**
> >   * Estimates an ordinary least squares regression model
> > @@ -55,7 +58,7 @@ import org.apache.commons.math.util.Fast
> >   *
> >   * @version $Id$
> >   */
> > -public class SimpleRegression implements Serializable {
> > +public class SimpleRegression implements Serializable,
> UpdatingMultipleLinearRegression {
> >
> >      /** Serializable version identifier */
> >      private static final long serialVersionUID = -3004689053607543335L;
> > @@ -98,7 +101,7 @@ public class SimpleRegression implements
> >      * Secondary constructor which allows the user the ability to
> include/exclude const
> >      * @param includeIntercept boolean flag, true includes an intercept
> >      */
> > -    public SimpleRegression(boolean includeIntercept){
> > +    public SimpleRegression(boolean includeIntercept) {
> >          super();
> >          hasIntercept = includeIntercept;
> >      }
> > @@ -116,7 +119,7 @@ public class SimpleRegression implements
> >       * @param x independent variable value
> >       * @param y dependent variable value
> >       */
> > -    public void addData(final double x, final double y){
> > +    public void addData(final double x,final double y) {
> >          if (n == 0) {
> >              xbar = x;
> >              ybar = y;
> > @@ -158,7 +161,7 @@ public class SimpleRegression implements
> >       * @param x independent variable value
> >       * @param y dependent variable value
> >       */
> > -    public void removeData(double x, double y) {
> > +    public void removeData(final double x,final double y) {
> >          if (n > 0) {
> >              if (hasIntercept) {
> >                  final double fact1 = (double) n - 1.0;
> > @@ -200,13 +203,69 @@ public class SimpleRegression implements
> >       * data.</p>
> >       *
> >       * @param data array of observations to be added
> > +     * @throws ModelSpecificationException if the length of {@code
> data[i]} is not
> > +     * greater than or equal to 2
> >       */
> > -    public void addData(double[][] data) {
> > +    public void addData(final double[][] data) {
> >          for (int i = 0; i < data.length; i++) {
> > +            if( data[i].length < 2 ){
> > +               throw new
> ModelSpecificationException(LocalizedFormats.INVALID_REGRESSION_OBSERVATION,
> > +                    data[i].length, 2);
> > +            }
>
> This can be conveyed using
>  DIMENSIONS_MISMATCH_SIMPLE
>  DIMENSIONS_MISMATCH
>  REGRESSION (to be added)
>
> >              addData(data[i][0], data[i][1]);
> >          }
> > +        return;
> > +    }
> > +
> > +    /**
> > +     * Adds one observation to the regression model.
> > +     *
> > +     * @param x the independent variables which form the design matrix
> > +     * @param y the dependent or response variable
> > +     * @throws ModelSpecificationException if the length of {@code x}
> does not equal
> > +     * the number of independent variables in the model
> > +     */
> > +    public void addObservation(final double[] x,final double y) throws
> ModelSpecificationException{
> > +        if( x == null || x.length == 0 ){
> > +            throw new
> ModelSpecificationException(LocalizedFormats.INVALID_REGRESSION_OBSERVATION,x!=null?x.length:0,
> 1);
> > +        }
> > +        addData( x[0], y );
> > +        return;
> >      }
> >
> > +    /**
> > +     * Adds a series of observations to the regression model. The
> lengths of
> > +     * x and y must be the same and x must be rectangular.
> > +     *
> > +     * @param x a series of observations on the independent variables
> > +     * @param y a series of observations on the dependent variable
> > +     * The length of x and y must be the same
> > +     * @throws ModelSpecificationException if {@code x} is not
> rectangular, does not match
> > +     * the length of {@code y} or does not contain sufficient data to
> estimate the model
> > +     */
> > +    public void addObservations(final double[][] x,final double[] y) {
> > +        if ((x == null) || (y == null) || (x.length != y.length)) {
> > +            throw new ModelSpecificationException(
> > +                  LocalizedFormats.DIMENSIONS_MISMATCH_SIMPLE,
> > +                  (x == null) ? 0 : x.length,
> > +                  (y == null) ? 0 : y.length);
> > +        }
>
> It is wrong to clump tests for null and for array size.
> The message could even be _wrong_ since you report a null as having size 0!
>
> For null, we should throw "NullArgumentException".[1]
> For zero size, "NoDataException".
>
> > +        boolean obsOk=true;
> > +        for( int i = 0 ; i < x.length; i++){
> > +            if( x[i] == null || x[i].length == 0 ){
> > +                obsOk = false;
> > +            }
> > +        }
> > +        if( !obsOk ){
> > +            throw new ModelSpecificationException(
> > +
>  LocalizedFormats.NOT_ENOUGH_DATA_FOR_NUMBER_OF_PREDICTORS,
> > +                  0, 1);
> > +        }
>
> REGRESSION
> NO_DATA
> NUMBER_IS_TOO_SMALL
>
> > +        for( int i = 0 ; i < x.length ; i++){
> > +            addData( x[i][0], y[i] );
> > +        }
> > +        return;
> > +    }
> >
> >      /**
> >       * Removes observations represented by the elements in
> <code>data</code>.
> > @@ -265,8 +324,8 @@ public class SimpleRegression implements
> >       * @param x input <code>x</code> value
> >       * @return predicted <code>y</code> value
> >       */
> > -    public double predict(double x) {
> > -        double b1 = getSlope();
> > +    public double predict(final double x) {
> > +        final double b1 = getSlope();
> >          if (hasIntercept) {
> >              return getIntercept(b1) + b1 * x;
> >          }
> > @@ -298,7 +357,7 @@ public class SimpleRegression implements
> >       *
> >       * @return true if constant exists, false otherwise
> >       */
> > -    public boolean hasIntercept(){
> > +    public boolean hasIntercept() {
> >          return hasIntercept;
> >      }
> >
> > @@ -572,7 +631,7 @@ public class SimpleRegression implements
> >       * @return half-width of 95% confidence interval for the slope
> estimate
> >       * @throws MathException if the confidence interval can not be
> computed.
> >       */
> > -    public double getSlopeConfidenceInterval(double alpha)
> > +    public double getSlopeConfidenceInterval(final double alpha)
> >          throws MathException {
> >          if (alpha >= 1 || alpha <= 0) {
> >              throw new
> OutOfRangeException(LocalizedFormats.SIGNIFICANCE_LEVEL,
> > @@ -620,7 +679,7 @@ public class SimpleRegression implements
> >      * @param slope current slope
> >      * @return the intercept of the regression line
> >      */
> > -    private double getIntercept(double slope){
> > +    private double getIntercept(final double slope) {
> >        if( hasIntercept){
> >          return (sumY - slope * sumX) / n;
> >        }
> > @@ -633,7 +692,134 @@ public class SimpleRegression implements
> >       * @param slope regression slope estimate
> >       * @return sum of squared deviations of predicted y values
> >       */
> > -    private double getRegressionSumSquares(double slope) {
> > +    private double getRegressionSumSquares(final double slope) {
> >          return slope * slope * sumXX;
> >      }
> > +
> > +    /**
> > +     * Performs a regression on data present in buffers and outputs a
> RegressionResults object
> > +     * @return RegressionResults acts as a container of regression
> output
> > +     * @throws ModelSpecificationException if the model is not correctly
> specified
> > +     */
> > +    public RegressionResults regress() throws
> ModelSpecificationException{
> > +        if( hasIntercept ){
> > +          if( n < 3 ){
> > +              throw new NoDataException(
> LocalizedFormats.NOT_ENOUGH_DATA_REGRESSION );
> > +          }
>
> This is slightly misleading ("no data" or "not enough data"?).
>
> > +          if( FastMath.abs( sumXX ) > MathUtils.SAFE_MIN ){
> > +              final double[] params = new double[]{ getIntercept(),
> getSlope() };
> > +              final double mse = getMeanSquareError();
> > +              final double _syy = sumYY + sumY * sumY / ((double) n);
> > +              final double[] vcv = new double[]{
> > +                mse * (xbar *xbar /sumXX + 1.0 / ((double) n)),
> > +                -xbar*mse/sumXX,
> > +                mse/sumXX };
> > +              return new RegressionResults(
> > +                      params, new double[][]{vcv}, true, n, 2,
> > +                      sumY, _syy, getSumSquaredErrors(),true,false);
> > +          }else{
> > +              final double[] params = new double[]{ sumY/((double) n),
> Double.NaN };
> > +              //final double mse = getMeanSquareError();
> > +              final double[] vcv = new double[]{
> > +                ybar / ((double) n - 1.0),
> > +                Double.NaN,
> > +                Double.NaN };
> > +              return new RegressionResults(
> > +                      params, new double[][]{vcv}, true, n, 1,
> > +                      sumY, sumYY, getSumSquaredErrors(),true,false);
> > +          }
> > +        }else{
> > +          if( n < 2 ){
> > +              throw new NoDataException(
> LocalizedFormats.NOT_ENOUGH_DATA_REGRESSION );
> > +          }
>
> Same as above.
>
> > +          if( !Double.isNaN(sumXX) ){
> > +          final double[] vcv = new double[]{ getMeanSquareError() /
> sumXX };
> > +          final double[] params = new double[]{ sumXY/sumXX };
> > +          return new RegressionResults(
> > +                      params, new double[][]{vcv}, true, n, 1,
> > +                      sumY, sumYY, getSumSquaredErrors(),false,false);
> > +          }else{
> > +          final double[] vcv = new double[]{Double.NaN };
> > +          final double[] params = new double[]{ Double.NaN };
> > +          return new RegressionResults(
> > +                      params, new double[][]{vcv}, true, n, 1,
> > +                      Double.NaN, Double.NaN, Double.NaN,false,false);
> > +          }
> > +        }
> > +    }
> > +
> > +    /**
> > +     * Performs a regression on data present in buffers including only
> regressors
> > +     * indexed in variablesToInclude and outputs a RegressionResults
> object
> > +     * @param variablesToInclude an array of indices of regressors to
> include
> > +     * @return RegressionResults acts as a container of regression
> output
> > +     * @throws ModelSpecificationException if the model is not correctly
> specified
> > +     * @throws MathIllegalArgumentException if the variablesToInclude
> array is null or zero length
> > +     * @throws OutOfRangeException if a requested variable is not
> present in model
> > +     */
> > +    public RegressionResults regress(int[] variablesToInclude) throws
> ModelSpecificationException{
> > +        if( variablesToInclude == null || variablesToInclude.length ==
> 0){
> > +          throw new
> MathIllegalArgumentException(LocalizedFormats.ARRAY_ZERO_LENGTH_OR_NULL_NOTALLOWED);
> > +        }
>
> Clumped tests.
>
> > +        if( variablesToInclude.length > 2 || (variablesToInclude.length
> > 1 && !hasIntercept) ){
> > +            throw new ModelSpecificationException(
> > +                    LocalizedFormats.ARRAY_SIZE_EXCEEDS_MAX_VARIABLES,
> > +                    (variablesToInclude.length > 1 && !hasIntercept) ? 1
> : 2);
> > +        }
>
> REGRESSION
> NUMBER_IS_TOO_LARGE
>
> > +        if( hasIntercept ){
> > +            if( variablesToInclude.length == 2 ){
> > +                if( variablesToInclude[0] == 1 ){
> > +                    throw new
> ModelSpecificationException(LocalizedFormats.NOT_INCREASING_SEQUENCE);
>
> This one should throw "NonMonotonicSequenceException".
> I really don't see the value of clumping completely different failure into
> the same "ModelSpecificationException". This renders the exception type
> useless.
> [The point of having typed exceptionz is to provide the flexibility of
> dealing with them programmatically. When the actual causes of failure
> varies
> wildly, this won't be possible.]
>
> > +                }else if( variablesToInclude[0] != 0 ){
> > +                    throw new OutOfRangeException(
> variablesToInclude[0], 0,1 );
> > +                }
> > +                if( variablesToInclude[1] != 1){
> > +                     throw new OutOfRangeException(
> variablesToInclude[0], 0,1 );
> > +                }
> > +                return regress();
> > +            }else{
> > +                if( variablesToInclude[0] != 1 && variablesToInclude[0]
> != 0 ){
> > +                     throw new OutOfRangeException(
> variablesToInclude[0],0,1 );
> > +                }
> > +                final double _mean = sumY * sumY / ((double) n);
> > +                final double _syy = sumYY + _mean;
> > +                if( variablesToInclude[0] == 0 ){
> > +                    //just the mean
> > +                    final double[] vcv = new double[]{
> sumYY/((double)((n-1)*n)) };
> > +                    final double[] params = new double[]{ ybar };
> > +                    return new RegressionResults(
> > +                      params, new double[][]{vcv}, true, n, 1,
> > +                      sumY, _syy+_mean, sumYY,true,false);
> > +
> > +                }else if( variablesToInclude[0] == 1){
> > +                    //final double _syy = sumYY + sumY * sumY /
> ((double) n);
> > +                    final double _sxx = sumXX + sumX * sumX / ((double)
> n);
> > +                    final double _sxy = sumXY + sumX * sumY / ((double)
> n);
> > +                    final double _sse = FastMath.max(0d, _syy - _sxy *
> _sxy / _sxx);
> > +                    final double _mse = _sse/((double)(n-1));
> > +                    if( !Double.isNaN(_sxx) ){
> > +                        final double[] vcv = new double[]{ _mse / _sxx
> };
> > +                        final double[] params = new double[]{ _sxy/_sxx
> };
> > +                        return new RegressionResults(
> > +                                    params, new double[][]{vcv}, true,
> n, 1,
> > +                                    sumY, _syy, _sse,false,false);
> > +                    }else{
> > +                        final double[] vcv = new double[]{Double.NaN };
> > +                        final double[] params = new double[]{ Double.NaN
> };
> > +                        return new RegressionResults(
> > +                                    params, new double[][]{vcv}, true,
> n, 1,
> > +                                    Double.NaN, Double.NaN,
> Double.NaN,false,false);
> > +                    }
> > +                }
> > +            }
> > +        }else{
> > +            if( variablesToInclude[0] != 0 ){
> > +                throw new
> OutOfRangeException(variablesToInclude[0],0,0);
> > +            }
> > +            return regress();
> > +        }
> > +
> > +        return null;
> > +    }
> >  }
> >
> > Modified:
> commons/proper/math/trunk/src/main/java/org/apache/commons/math/stat/regression/UpdatingMultipleLinearRegression.java
> > URL:
> http://svn.apache.org/viewvc/commons/proper/math/trunk/src/main/java/org/apache/commons/math/stat/regression/UpdatingMultipleLinearRegression.java?rev=1174509&r1=1174508&r2=1174509&view=diff
> >
> ==============================================================================
> > ---
> commons/proper/math/trunk/src/main/java/org/apache/commons/math/stat/regression/UpdatingMultipleLinearRegression.java
> (original)
> > +++
> commons/proper/math/trunk/src/main/java/org/apache/commons/math/stat/regression/UpdatingMultipleLinearRegression.java
> Fri Sep 23 03:36:11 2011
> > @@ -61,7 +61,7 @@ public interface UpdatingMultipleLinearR
> >       * @throws ModelSpecificationException if {@code x} is not
> rectangular, does not match
> >       * the length of {@code y} or does not contain sufficient data to
> estimate the model
> >       */
> > -    void addObservations(double[][] x, double[] y);
> > +    void addObservations(double[][] x, double[] y) throws
> ModelSpecificationException;
>
> This is exactly what I say just above: a single exception for totally
> unrelated failures.
> [And it is yet another example of mixing (i.e. confusing) library and
> application concerns: The library must check the preconditions that are
> needed to perform its work, and throw an exception that it commensurate
> with
> the failed test. Then, the application layer (the caller) can possibly
> translate this into a "business" language. At the level of CM, we lack the
> full context in order to be sure that one fixed high-level message wil be
> adequate in all circumstances. As I've written numerous times also: When
> designing code for CM, we must strive to separate the roles of "CM user"
> and
> "CM developer".]
>
> >      /**
> >       * Clears internal buffers and resets the regression model. This
> means all
> >
> > [...]
>
> Also, please be careful of the code formatting style. This commit contains
> many violations of the following rules:
>  * There should be a space character before a "{" character.
>  * There should be a space character on both sides of a keyword.
>  * There should be a space on both sides of an operator.
>  * There should not be a space after an opening parenthesis.
>  * There should not be a space before a closing parenthesis.
>  * There should not be a space before a ";" character.
>  * Variables (instance or local) should not contain a "_" character.
>  * Indents should be (muliples of) 4 space characters wide.
>
>
> Thanks and best regards,
> Gilles
>
> [1] There is an unsettled issue about whether the CM code should check for
>    "null" (and throw "NullArgumentException") or let the JVM throw the
>    standard NPE.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>

Reply via email to