Re: Some feedback on the Gradient Descent Code

Theodore Vasiloudis Thu, 28 May 2015 08:48:06 -0700

+1

This separation was the idea from the start, there is trade-off between
having highly configureable optimizers and ensuring that the right types of
regularization can only be applied to optimization algorithms that support
them.

It comes down to viewing the optimization framework mostly as a basis to
build learners upon. We want to give
users the freedom to choose their optimization algorithm when creating for
example a Multiple linear regression learner, but we have to ensure that
the parameters that they
set for the optimizers are valid (e.g. should not be able to set L1
regularization when using L-BFGS).

On Thu, May 28, 2015 at 5:37 PM, Till Rohrmann <till.rohrm...@gmail.com>
wrote:

> I think so too. Ok, I'll try to update the PR accordingly.
>
> On Thu, May 28, 2015 at 5:36 PM, Mikio Braun <mikiobr...@googlemail.com>
> wrote:
>
> > Ah yeah, I see.. .
> >
> > Yes, it's right that many algorithms perform quite differently
> > depending on the kind of regularization... . Same holds for cutting
> > plane algorithms which either reduce to linear or quadratic programs
> > depending on L1 or L2. Generally speaking, I think this is also not
> > surprising as L1 is not differentiable everywhere and you'd have to
> > use different regularizations... .
> >
> > So it probably makes sense to separate the loss from the cost function
> > (which is then only defined by the model and the loss function), and
> > have the regularization extra.
> >
> > -M
> >
> > --
> > Mikio Braun - http://blog.mikiobraun.de, http://twitter.com/mikiobraun
> >
>

Re: Some feedback on the Gradient Descent Code

Reply via email to