For linear regression, the main tasks are computing the covariance matrix and X * y, which can both be parallelized well, and then you need to solve a linear equation whose dimension consists of the number of features. So if number of features is small, it actually makes sense to do the setup in Flink but then solve it directly.
Working on some example code on this one... . On Thu, Jun 4, 2015 at 4:51 PM, Till Rohrmann <trohrm...@apache.org> wrote: > I agree that given a small data set it's probably better to solve the > linear regression problem directly. However, I'm not so sure how well this > performs if the data gets really big (more in terms of number of data > points). But maybe we can find something like a sweet spot when to switch > between both methods. And maybe a distributed conjugate gradient methods > can also beat SGD if the data is too large to be computed on a single > machine. > > Until we have adagrad or another more robust learning rate strategy, we > could also deactivate the default value for simple SGD. This makes users > aware that they have to tweak this parameter. > > On Thu, Jun 4, 2015 at 2:54 PM, Ted Dunning <ted.dunn...@gmail.com> wrote: > >> On Thu, Jun 4, 2015 at 1:26 PM, Till Rohrmann <trohrm...@apache.org> >> wrote: >> >> > Maybe also the default learning rate of 0.1 is set too high. >> > >> >> Could be. >> >> But grid search on learning rate is pretty standard practice. Running >> multiple learning engines at the same time with different learning rates is >> pretty plausible. >> >> Also, using something like adagrad will knock down high learning rates very >> quickly if you get a nearly divergent step. This can make initially high >> learning rates quite plausible. >> -- Mikio Braun - http://blog.mikiobraun.de, http://twitter.com/mikiobraun