[math] Improving numerics in OLSMultipleLinearRegression

Phil Steitz Sun, 08 Jun 2008 19:17:46 -0700

While clear and elegant from a matrix algebra standpoint, the "nailve"implementation in OLSMultipleLinearRegression has bad numericalqualities. It is well known that solving the normal equations directlydoes not give good numerics. I just added some tests to actually verifyparameter values, using the classic "Longly" dataset, for which NISTprovides certified statistics. This is a "hard" design matrix. R wasable to get to within 1E-8 of the certified parameter values.OLSMultipleLinearRegression can only get 1E-1.We have talked in the past about providing an implementation based on QRdecomposition. Anyone up for using the QR decomposition that we nowhave to do this? I really think we need to do it (or something else toimprove numerics) before releasing this class. I will get to iteventually, but am a little pegged at the moment. I will review andapply patches if someone is willing to do the implementation. I canalso explain here or offline how the R tests and NIST datasets work, asthese are useful in validating code.

Another thing that we should think about before releasing any of thisstuff is the completeness of the API. Many standard regressionstatistics are missing. If we are going to stick with the Interface /Implementation setup, we need to get the right stuff into theinterface. It is also awkward to have to insert "1"'s in the designmatrix to get an intercept term computed. This is convenient forimplementation, but awkward for users. A more natural setup (IMHO)would be to expose a "noIntercept" or "hasIntercept" property for the model.




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[math] Improving numerics in OLSMultipleLinearRegression

Reply via email to