On 7/13/11 9:34 PM, Greg Sterijevski wrote: > All, > > I am working on some additions to the regression package and have run into a > bit of difficulty. > > The statistical R Squared is equal to 1.0 - > SumOfSquaredError/SumOfSquaresTotal. Say that I run my regression two > different ways. The first manner I tell the regression technique to include > a constant, so the SumOfSquaresTotal = Summation of ( Y - Mean(Y) ) ^2. In > the next run, I tell the regression technique not to include a constant, but > I do include one in the data I supply ( one rhs variable is always set to > one). The models are identical, but the R Squared may not be consistent, > since in the second run I will assume Mean(Y) = 0.0.
The models are not identical, because in the second case, the unitary column is a (zero variance) regressor. I would say whoever supplied the data did not understand the API and the reported R-square is meaningless. This is why we indicate in the javadoc that the data should *not* include unitary columns, but the hasIntercept property should be used instead to indicate that the model should include an intercept term. > > The question to the list is what is the proper course of action? Ignore it > and leave the obvious inconsistency? Force a mean? (That's not exactly a > good solution). Empirically test the data as it comes in? If an independent > variable exhibits zero variance, then it must be the constant. I then set > the flag for it, and get the correct result? It is never a good idea to try to "fix" API abuses / data anomalies / bad specifications, other than to throw exceptions on easily discernible precondition violations. In this case, for example, zero variance regressors may actually occur in data and our decision to change the model specification may be not at all what the user wants. Our contract with users is that we clearly document how the API works, preconditions, algorithms, etc. and compute what we say we compute. I would say don't do anything special in this case. Phil > > Thoughts? > > -Greg > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org