Re: [statistics] Pull request for GLSMultipleLinearRegression

Eric Barnhill Thu, 23 May 2019 17:33:25 -0700

Hi Elena,

Thanks for this intriguing idea. As far as I ever knew IRLS requires a
matrix. Can you provide me with a citation where I can read about this
vector-based approach?


Thanks,
Eric


On Thu, May 23, 2019, 06:44 Елена Картышева <el.kartysh...@yandex.ru> wrote:

> Hello.
>
> I would like to propose a pull request implementing an option to use
> variance vector instead of covariance matrix. It allows users to avoid
> unnecessary memory usage and excessive computation in case of uncorrelated
> but heteroscedastic errors thus making it possible to work with huge input
> matrices. Using variance vector in such cases allows to reduce time
> complexity from O(N^2) to just O(N) (where N is a number of observations)
> and dramatically reduce memory usage. For example, in my practice arose a
> need to train generalized linear model. Usage of Iteratively reweighted
> least squares algorithm requires weighted regression with more than a
> million observations. Current implementation would require approximately 12
> terabytes of memory while patched version needs only 8 megabytes. Since
> IRLS is iterative algorithm a million-times complexity reduction is also
> pretty handy.
>
>
> --
> Sincerely yours, Elena Kartysheva.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>

Re: [statistics] Pull request for GLSMultipleLinearRegression

Reply via email to