+1
This separation was the idea from the start, there is trade-off between
having highly configureable optimizers and ensuring that the right types of
regularization can only be applied to optimization algorithms that support
them.
It comes down to viewing the optimization framework mostly as a b
I think so too. Ok, I'll try to update the PR accordingly.
On Thu, May 28, 2015 at 5:36 PM, Mikio Braun
wrote:
> Ah yeah, I see.. .
>
> Yes, it's right that many algorithms perform quite differently
> depending on the kind of regularization... . Same holds for cutting
> plane algorithms which ei
Ah yeah, I see.. .
Yes, it's right that many algorithms perform quite differently
depending on the kind of regularization... . Same holds for cutting
plane algorithms which either reduce to linear or quadratic programs
depending on L1 or L2. Generally speaking, I think this is also not
surprising
Yes GradientDescent == (batch-)SGD.
That was also my first idea of how to implement it. However, what happens
if the regularization is specific to the actually used algorithm. For
example, for L-BFGS with L1 regularization you have a different
`parameterUpdate` step (Orthant-wise Limited Memory Qu
GradientDescent is the just the (batch-)SGD optimizer right? Actually
I think the parameter update should be done by a
RegularizationFunction.
IMHO the structure should be like this:
GradientDescent
- collects gradient and regularization updates from - CostFunction
LinearModelCostFunction
- i
Hey Mikio,
yes you’re right. The SGD only needs to know the gradient of the loss
function and some mean to update the weights in accordance with the
regularization scheme. Additionally, we also need to be able to compute the
loss for the convergence criterion.
That’s also how it is implemented in
[Ok, so maybe this is exactly what is implemented, sorry if I'm just
repeating you... ]
So
C(w, xys) = C regularization(w) + sum over yxs of losses
Gradient is
C grad reg(w) + sum grad losses(w, xy)
For some regularization functions, regularization is better performed
by some explicit op
Oh wait.. continue to type. accidentally sent out the message to early.
On Thu, May 28, 2015 at 4:03 PM, Mikio Braun wrote:
> Hi Till and Theodore,
>
> I think the code is cleaned up a lot now, introducing the
> mapWithBcVariable helped a lot.
>
> I also get that the goal was to make a cost funct
Hi Till and Theodore,
I think the code is cleaned up a lot now, introducing the
mapWithBcVariable helped a lot.
I also get that the goal was to make a cost function for learning
linear model configurable well. My main concern was that the solver
itself was already too specifically bound to the ki
What tweaks would that be? I mean what is required to implement L-BFGS?
I guess that we won’t get rid of the case statements because we have to
decide between two code paths: One with and the other without convergence
criterion. But I think by pulling each branch in its own function, it
becomes cl
10 matches
Mail list logo