Hello All,

I have been a user of the math commons jar for a little over a year and am
very impressed with it. I was wondering whether anyone is actively working
on implementing functionality to do regressions on very very large data
sets. The current implementation of the OLS routine is an in-core QR
decomposition with substitution. While the solutions are typically accurate,
the in-core nature limits the usefulness of these objects.

Looking through the code, most of the implementation of an InputStream based
regression routine would respect the contract implicit in the interface
MultipleLinearRegression. However, large regression problems are important
enough that there should be a way to:

1. Wrap a potentially large data source, perhaps as an InputStream of some
sort.
2. Have a separate contract with methods like clear() ( to clear whatever
intermediate calculations are stored), and regress() which generates
immutable results that are not affected by further updates of the data.

I would appreciate any thoughts or comments, as well suggestions about
functionality already in math commons which might address some points I
raised.

Thank you,

-Greg

Reply via email to