Hi Alessandro,

I think absolute error as splitting criterion might be feasible with the
current architecture -- I think the sufficient statistics we collect
currently might be able to support this. Could you let us know scenarios
where absolute error has significantly outperformed squared error for
regression trees? Also, what's your use case that makes squared error
undesirable.

For gradient boosting, you are correct. The weak hypothesis weights refer
to tree predictions in each of the branches. We plan to explain this in the
1.2 documentation and may be add some more clarifications to the Javadoc.

I will try to search for JIRAs or create new ones and update this thread.

-Manish

On Monday, November 17, 2014, Alessandro Baretta <alexbare...@gmail.com>
wrote:

> Manish,
>
> Thanks for pointing me to the relevant docs. It is unfortunate that
> absolute error is not supported yet. I can't seem to find a Jira for it.
>
> Now, here's the what the comments say in the current master branch:
> /**
>  * :: Experimental ::
>  * A class that implements Stochastic Gradient Boosting
>  * for regression and binary classification problems.
>  *
>  * The implementation is based upon:
>  *   J.H. Friedman.  "Stochastic Gradient Boosting."  1999.
>  *
>  * Notes:
>  *  - This currently can be run with several loss functions.  However,
> only SquaredError is
>  *    fully supported.  Specifically, the loss function should be used to
> compute the gradient
>  *    (to re-label training instances on each iteration) and to weight
> weak hypotheses.
>  *    Currently, gradients are computed correctly for the available loss
> functions,
>  *    but weak hypothesis weights are not computed correctly for LogLoss
> or AbsoluteError.
>  *    Running with those losses will likely behave reasonably, but lacks
> the same guarantees.
> ...
> */
>
> By the looks of it, the GradientBoosting API would support an absolute
> error type loss function to perform quantile regression, except for "weak
> hypothesis weights". Does this refer to the weights of the leaves of the
> trees?
>
> Alex
>
> On Mon, Nov 17, 2014 at 2:24 PM, Manish Amde <manish...@gmail.com
> <javascript:_e(%7B%7D,'cvml','manish...@gmail.com');>> wrote:
>
>> Hi Alessandro,
>>
>> MLlib v1.1 supports variance for regression and gini impurity and entropy
>> for classification.
>> http://spark.apache.org/docs/latest/mllib-decision-tree.html
>>
>> If the information gain calculation can be performed by distributed
>> aggregation then it might be possible to plug it into the existing
>> implementation. We want to perform such calculations (for e.g. median) for
>> the gradient boosting models (coming up in the 1.2 release) using absolute
>> error and deviance as loss functions but I don't think anyone is planning
>> to work on it yet. :-)
>>
>> -Manish
>>
>> On Mon, Nov 17, 2014 at 11:11 AM, Alessandro Baretta <
>> alexbare...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','alexbare...@gmail.com');>> wrote:
>>
>>> I see that, as of v. 1.1, MLLib supports regression and classification
>>> tree
>>> models. I assume this means that it uses a squared-error loss function
>>> for
>>> the first and logistic cost function for the second. I don't see support
>>> for quantile regression via an absolute error cost function. Or am I
>>> missing something?
>>>
>>> If, as it seems, this is missing, how do you recommend to implement it?
>>>
>>> Alex
>>>
>>
>>
>

Reply via email to