Ah yeah, I see.. . Yes, it's right that many algorithms perform quite differently depending on the kind of regularization... . Same holds for cutting plane algorithms which either reduce to linear or quadratic programs depending on L1 or L2. Generally speaking, I think this is also not surprising as L1 is not differentiable everywhere and you'd have to use different regularizations... .
So it probably makes sense to separate the loss from the cost function (which is then only defined by the model and the loss function), and have the regularization extra. -M -- Mikio Braun - http://blog.mikiobraun.de, http://twitter.com/mikiobraun