Hello All, Sorry for being a bit slow on the uptake... I am still in the wilds of numerical imprecision with the longley data. I am getting close to figuring out where the error is being accumulated.
I agree that interfaces impose rigidity in the design. However, there are broad similarities in linear regression. Whether we are doing OLS, panel regression, ridge regression, robust regression, etc., the model is linear: Y = XB + e. The estimation technique may be complicated (or non-linear) , but that is an implementation detail. I like interfaces because it forces a discipline on the code. It forces you to specify a minimum contract which oftentimes makes larger problems more tractable. The end user should be able to swap out one technique for another with minimal recoding. It also makes it easy to write RMI stubs and make distributed calls. I like abstract classes in general, but we might end up with an abstract class with no concrete methods, or even worse. We might have a very very deep inheritance tree ( AbstractRegression>> AbstractUpdatingLinearRegression>> PanelRegression>> OneWayFixedEffects. ). That being said, Phil and Ted, you guys are definitely the experts on the design. I thought I would add my opinion to the mix. On the features front, as we deliberate over the design issues its important that we have an eye to what is missing. Here are some features which I believe should be in the regression package: 1. A SVD based OLS regression. (because sometimes messing with eigenvalues is a must) 2. A functionality to impose arbitrary linear equality restrictions: Perhaps a regress method with the following signature, public RegressionResult regress( RealMatrix coeff, RealVector const); 3. Related to (2) linear hypothesis testing 4. Related to (2) estimates of the LaGrangian and its variance covariance matrix 5. Robust variance covariance estimators 6. Perhaps panel regression. This one is a bit larger than regression. We would need to track the other dimensions of an observation (if we are studying income for a group of people, we might track the individual, the year, race and so forth as other dimensions which impose level shifts in the hyperplane). The regression might need a pull mechanism to make two passes through the data. First pass builds things like means or augments the design matrix with dummy variables. The second pass would actually run the regression on the transformed data. 7. Some sort of redundancy indicator. This is especially important when you allow for parameter restrictions since you want to know which parameter is not being used. 8. Some meta structure to allow for stepwise regression. -Greg