+1 This separation was the idea from the start, there is trade-off between having highly configureable optimizers and ensuring that the right types of regularization can only be applied to optimization algorithms that support them.
It comes down to viewing the optimization framework mostly as a basis to build learners upon. We want to give users the freedom to choose their optimization algorithm when creating for example a Multiple linear regression learner, but we have to ensure that the parameters that they set for the optimizers are valid (e.g. should not be able to set L1 regularization when using L-BFGS). On Thu, May 28, 2015 at 5:37 PM, Till Rohrmann <till.rohrm...@gmail.com> wrote: > I think so too. Ok, I'll try to update the PR accordingly. > > On Thu, May 28, 2015 at 5:36 PM, Mikio Braun <mikiobr...@googlemail.com> > wrote: > > > Ah yeah, I see.. . > > > > Yes, it's right that many algorithms perform quite differently > > depending on the kind of regularization... . Same holds for cutting > > plane algorithms which either reduce to linear or quadratic programs > > depending on L1 or L2. Generally speaking, I think this is also not > > surprising as L1 is not differentiable everywhere and you'd have to > > use different regularizations... . > > > > So it probably makes sense to separate the loss from the cost function > > (which is then only defined by the model and the loss function), and > > have the regularization extra. > > > > -M > > > > -- > > Mikio Braun - http://blog.mikiobraun.de, http://twitter.com/mikiobraun > > >