Re: [math] Refactoring multiple regression classes

Phil Steitz Wed, 13 Jul 2011 17:35:34 -0700

On 7/13/11 5:15 PM, Greg Sterijevski wrote:
> Hello All,
>
> Sorry for being a bit slow on the uptake... I am still in the wilds of
> numerical imprecision with the longley data. I am getting close to figuring
> out where the error is being accumulated.
>
> I agree that interfaces impose rigidity in the design. However, there are
> broad similarities in linear regression. Whether we are doing OLS, panel
> regression, ridge regression, robust regression, etc., the model is linear:
> Y = XB + e. The estimation technique may be complicated (or non-linear) ,
> but that is an implementation detail. I like interfaces because it forces a
> discipline on the code. It forces you to specify a minimum contract which
> oftentimes makes larger problems more tractable. The end user should be able
> to swap out one technique for another with minimal recoding. It also makes
> it easy to write RMI stubs and make distributed calls.
>
> I like abstract classes in general, but we might end up with an abstract
> class with no concrete methods, or even worse. We might have a very very
> deep inheritance tree (   AbstractRegression>>
>   AbstractUpdatingLinearRegression>>
>     PanelRegression>>
>        OneWayFixedEffects. ).


How exactly do interfaces make the hierarchy flatter in this case? 
I agree we should aim for as simple a structure as possible.  The
question is, what is that structure?
>
> That being said, Phil and Ted, you guys are definitely the experts on the
> design. I thought I would add my opinion to the mix.

There are no "experts" here - or maybe we are *all* experts :)  In
any case, your opinions are much appreciated.  Apart from the
interface/abstract class question at the top of the hierarchy, do
you have other suggestions on how to do this?  Can you see a simpler
model using interfaces instead of abstract classes at other levels?
>
> On the features front, as we deliberate over the design issues its important
> that we have an eye to what is missing.
This is probably the best argument against interfaces - they make it
basically impossible to add "what is missing" once you cut a release
including them.  The ideas below are great and illustrate how we
need to keep things as flexible as possible.  We should think about
how the refactored structure we are defining will accommodate them.

Phil
>  Here are some features which I
> believe should be in the regression package:
>
> 1. A SVD based OLS regression. (because sometimes messing with eigenvalues
> is a must)
> 2. A functionality to impose arbitrary linear equality restrictions:
>      Perhaps a regress method with the following signature,
>      public RegressionResult regress( RealMatrix coeff, RealVector const);
> 3. Related to (2) linear hypothesis testing
> 4. Related to (2) estimates of the LaGrangian and its variance covariance
> matrix
> 5. Robust variance covariance estimators
> 6. Perhaps panel regression.
>      This one is a bit larger than regression. We would need to track the
> other dimensions of  an observation (if we are studying income for a group
> of people, we might track the individual, the year, race and so forth as
> other dimensions which impose level shifts in the hyperplane). The
> regression might need a pull mechanism to make two passes through the data.
> First pass builds things like means or augments the design matrix with dummy
> variables. The second pass would actually run the regression on the
> transformed data.
> 7. Some sort of redundancy indicator. This is especially important when you
> allow for parameter restrictions since you want to know which parameter is
> not being used.
> 8. Some meta structure to allow for stepwise regression.
>
> -Greg
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [math] Refactoring multiple regression classes

Reply via email to