At Ted's suggestion I looked at LSMR in mahout. In general, I have no
complaints about the algorithm or how it is coded up. I have seen the
algorithm in the PrimalDual solver that Micheal Saunders et al cooked up. I
believe the solver is part of the COIN project. I have nothing but praises
for it.

However, what I contacted Phil about was setting up some interfaces to
define a general contract so that we could code up different ways of
performing OLS. To wit, here is what I had in  mind:

public interface UpdatingLinearRegression {
    public long getNobs();
    public void addData( double[] x, double y);
    public void addData( double[][] x, double[] y);
    public void clear();
    public RegressionResults regress()  throws MathException;
    public RegressionResults regress(int[] variablesToInclude)  throws
MathException;
}

The other interface is:

public interface RegressionResults {
    public double getParameterEstimate(int index) throws
IndexOutOfBoundsException;
    public double[] getParameterEstimates();
    public double getStdErrorOfEstimate(int index) throws
IndexOutOfBoundsException;
    public double[] getStdErrorOfEstimates();
    public boolean isRedundant(int index) throws IndexOutOfBoundsException,
MathException;
    public boolean[] getRedundant();
    public int getNumberOfParameters();
    public long getNobs();
    public double getTotalSumSquares();
    public double getRegressionSumSquares();
    public double getErrorSumSquares();
    public double getMeanSquareError();
    public double getRSquared();
}

Borrowing liberally from the SimpleRegressionClass,  the above functionality
describes most of what a user would expect from a classical regression
analysis. What the interface buys us is the ability to support the many ways
to generate the results above: QR factorizations, in place gaussian
elimination, incremental SVD and so forth.

Thoughts?

-Greg

Reply via email to