Hello All, I have been a user of the math commons jar for a little over a year and am very impressed with it. I was wondering whether anyone is actively working on implementing functionality to do regressions on very very large data sets. The current implementation of the OLS routine is an in-core QR decomposition with substitution. While the solutions are typically accurate, the in-core nature limits the usefulness of these objects.
Looking through the code, most of the implementation of an InputStream based regression routine would respect the contract implicit in the interface MultipleLinearRegression. However, large regression problems are important enough that there should be a way to: 1. Wrap a potentially large data source, perhaps as an InputStream of some sort. 2. Have a separate contract with methods like clear() ( to clear whatever intermediate calculations are stored), and regress() which generates immutable results that are not affected by further updates of the data. I would appreciate any thoughts or comments, as well suggestions about functionality already in math commons which might address some points I raised. Thank you, -Greg