One comment about
"""
1) I agree the sorting method you suggested is a very efficient way to
handle the unordered categorical variables in binary classification
and regression. I propose we have a Spark ML Transformer to do the
sorting and encoding, bringing the benefits to many tree based
methods.
Hi DB Tsai,
Thank you again for your insightful comments!
1) I agree the sorting method you suggested is a very efficient way to
handle the unordered categorical variables in binary classification
and regression. I propose we have a Spark ML Transformer to do the
sorting and encoding, bringing th
Hi Meihua,
For categorical features, the ordinal issue can be solved by trying
all kind of different partitions 2^(q-1) -1 for q values into two
groups. However, it's computational expensive. In Hastie's book, in
9.2.4, the trees can be trained by sorting the residuals and being
learnt as if they
Hi YiZhi,
Thank you for mentioning the jira. I will add a note to the jira.
Meihua
On Mon, Oct 26, 2015 at 6:16 PM, YiZhi Liu wrote:
> There's an xgboost exploration jira SPARK-8547. Can it be a good start?
>
> 2015-10-27 7:07 GMT+08:00 DB Tsai :
>> Also, does it support categorical feature?
>>
Hi DB Tsai,
Thank you very much for your interest and comment.
1) feature sub-sample is per-node, like random forest.
2) The current code heavily exploits the tree structure to speed up
the learning (such as processing multiple learning node in one pass of
the training data). So a generic GBM is
There's an xgboost exploration jira SPARK-8547. Can it be a good start?
2015-10-27 7:07 GMT+08:00 DB Tsai :
> Also, does it support categorical feature?
>
> Sincerely,
>
> DB Tsai
> --
> Web: https://www.dbtsai.com
> PGP Key ID: 0xAF08DF8D
>
Also, does it support categorical feature?
Sincerely,
DB Tsai
--
Web: https://www.dbtsai.com
PGP Key ID: 0xAF08DF8D
On Mon, Oct 26, 2015 at 4:06 PM, DB Tsai wrote:
> Interesting. For feature sub-sampling, is it per-node or per-tree? Do
>
Interesting. For feature sub-sampling, is it per-node or per-tree? Do
you think you can implement generic GBM and have it merged as part of
Spark codebase?
Sincerely,
DB Tsai
--
Web: https://www.dbtsai.com
PGP Key ID: 0xAF08DF8D
On Mon, Oc