Hi, Could anyone elaborate on the regularization in Spark? I've found that L1 and L2 are implemented with Updaters (L1Updater, SquaredL2Updater). 1)Why the loss reported by L2 is (0.5 * regParam * norm * norm) where norm is Norm(weights, 2.0)? It should be 0.5*regParam*norm (0.5 to disappear after differentiation). It seems that it is mixed up with mean squared error. 2)Why all weights are regularized? I think we should leave the bias weights (aka free or intercept) untouched if we don't assume that the data is centralized. 3)Are there any short-term plans to move regularization from updater to a more convenient place?
Best regards, Alexander