At Ted's suggestion I looked at LSMR in mahout. In general, I have no complaints about the algorithm or how it is coded up. I have seen the algorithm in the PrimalDual solver that Micheal Saunders et al cooked up. I believe the solver is part of the COIN project. I have nothing but praises for it.
However, what I contacted Phil about was setting up some interfaces to define a general contract so that we could code up different ways of performing OLS. To wit, here is what I had in mind: public interface UpdatingLinearRegression { public long getNobs(); public void addData( double[] x, double y); public void addData( double[][] x, double[] y); public void clear(); public RegressionResults regress() throws MathException; public RegressionResults regress(int[] variablesToInclude) throws MathException; } The other interface is: public interface RegressionResults { public double getParameterEstimate(int index) throws IndexOutOfBoundsException; public double[] getParameterEstimates(); public double getStdErrorOfEstimate(int index) throws IndexOutOfBoundsException; public double[] getStdErrorOfEstimates(); public boolean isRedundant(int index) throws IndexOutOfBoundsException, MathException; public boolean[] getRedundant(); public int getNumberOfParameters(); public long getNobs(); public double getTotalSumSquares(); public double getRegressionSumSquares(); public double getErrorSumSquares(); public double getMeanSquareError(); public double getRSquared(); } Borrowing liberally from the SimpleRegressionClass, the above functionality describes most of what a user would expect from a classical regression analysis. What the interface buys us is the ability to support the many ways to generate the results above: QR factorizations, in place gaussian elimination, incremental SVD and so forth. Thoughts? -Greg