Manish, Thanks for pointing me to the relevant docs. It is unfortunate that absolute error is not supported yet. I can't seem to find a Jira for it.
Now, here's the what the comments say in the current master branch: /** * :: Experimental :: * A class that implements Stochastic Gradient Boosting * for regression and binary classification problems. * * The implementation is based upon: * J.H. Friedman. "Stochastic Gradient Boosting." 1999. * * Notes: * - This currently can be run with several loss functions. However, only SquaredError is * fully supported. Specifically, the loss function should be used to compute the gradient * (to re-label training instances on each iteration) and to weight weak hypotheses. * Currently, gradients are computed correctly for the available loss functions, * but weak hypothesis weights are not computed correctly for LogLoss or AbsoluteError. * Running with those losses will likely behave reasonably, but lacks the same guarantees. ... */ By the looks of it, the GradientBoosting API would support an absolute error type loss function to perform quantile regression, except for "weak hypothesis weights". Does this refer to the weights of the leaves of the trees? Alex On Mon, Nov 17, 2014 at 2:24 PM, Manish Amde <manish...@gmail.com> wrote: > Hi Alessandro, > > MLlib v1.1 supports variance for regression and gini impurity and entropy > for classification. > http://spark.apache.org/docs/latest/mllib-decision-tree.html > > If the information gain calculation can be performed by distributed > aggregation then it might be possible to plug it into the existing > implementation. We want to perform such calculations (for e.g. median) for > the gradient boosting models (coming up in the 1.2 release) using absolute > error and deviance as loss functions but I don't think anyone is planning > to work on it yet. :-) > > -Manish > > On Mon, Nov 17, 2014 at 11:11 AM, Alessandro Baretta < > alexbare...@gmail.com> wrote: > >> I see that, as of v. 1.1, MLLib supports regression and classification >> tree >> models. I assume this means that it uses a squared-error loss function for >> the first and logistic cost function for the second. I don't see support >> for quantile regression via an absolute error cost function. Or am I >> missing something? >> >> If, as it seems, this is missing, how do you recommend to implement it? >> >> Alex >> > >