Manish,

Thanks for pointing me to the relevant docs. It is unfortunate that
absolute error is not supported yet. I can't seem to find a Jira for it.

Now, here's the what the comments say in the current master branch:
/**
 * :: Experimental ::
 * A class that implements Stochastic Gradient Boosting
 * for regression and binary classification problems.
 *
 * The implementation is based upon:
 *   J.H. Friedman.  "Stochastic Gradient Boosting."  1999.
 *
 * Notes:
 *  - This currently can be run with several loss functions.  However, only
SquaredError is
 *    fully supported.  Specifically, the loss function should be used to
compute the gradient
 *    (to re-label training instances on each iteration) and to weight weak
hypotheses.
 *    Currently, gradients are computed correctly for the available loss
functions,
 *    but weak hypothesis weights are not computed correctly for LogLoss or
AbsoluteError.
 *    Running with those losses will likely behave reasonably, but lacks
the same guarantees.
...
*/

By the looks of it, the GradientBoosting API would support an absolute
error type loss function to perform quantile regression, except for "weak
hypothesis weights". Does this refer to the weights of the leaves of the
trees?

Alex

On Mon, Nov 17, 2014 at 2:24 PM, Manish Amde <manish...@gmail.com> wrote:

> Hi Alessandro,
>
> MLlib v1.1 supports variance for regression and gini impurity and entropy
> for classification.
> http://spark.apache.org/docs/latest/mllib-decision-tree.html
>
> If the information gain calculation can be performed by distributed
> aggregation then it might be possible to plug it into the existing
> implementation. We want to perform such calculations (for e.g. median) for
> the gradient boosting models (coming up in the 1.2 release) using absolute
> error and deviance as loss functions but I don't think anyone is planning
> to work on it yet. :-)
>
> -Manish
>
> On Mon, Nov 17, 2014 at 11:11 AM, Alessandro Baretta <
> alexbare...@gmail.com> wrote:
>
>> I see that, as of v. 1.1, MLLib supports regression and classification
>> tree
>> models. I assume this means that it uses a squared-error loss function for
>> the first and logistic cost function for the second. I don't see support
>> for quantile regression via an absolute error cost function. Or am I
>> missing something?
>>
>> If, as it seems, this is missing, how do you recommend to implement it?
>>
>> Alex
>>
>
>

Reply via email to