GitHub user sethah opened a pull request:

    https://github.com/apache/spark/pull/11903

    [SPARK-13952][ML] Add random seed to GBT

    ## What changes were proposed in this pull request?
    
    `GBTClassifier` and `GBTRegressor` should use random seed for reproducible 
results. Because of the nature of current unit tests, which compare GBTs in ML 
and GBTs in MLlib for equality, I also added a random seed to MLlib GBT 
algorithm. I made alternate constructors in `mllib.tree.GradientBoostedTrees` 
to accept a random seed, but left them as private so as to not change the API 
unnecessarily.
     
    ## How was this patch tested?
    
    Existing unit tests verify that functionality did not change. Other ML 
algorithms do not seem to have unit tests that directly test the functionality 
of random seeding, but reproducibility with seeding for GBTs is effectively 
verified in existing tests. I can add more tests if needed.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sethah/spark SPARK-13952

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11903.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11903
    
----
commit afea147f82c4c57fe2c2988f8b6cdcb918d6675a
Author: sethah <[email protected]>
Date:   2016-03-21T22:00:37Z

    add seed to GBT

commit 84e2da64e27585c4cda8ac12b0aed57280f26ebe
Author: sethah <[email protected]>
Date:   2016-03-21T23:31:06Z

    adding seed to MLlib decision trees

commit f71492eb03d9f3e877598dbc2cbabbbbd72186e8
Author: sethah <[email protected]>
Date:   2016-03-22T00:13:57Z

    make constructors private

commit 62aa91d6e9cbaa50df40260e33c35810faf997ff
Author: sethah <[email protected]>
Date:   2016-03-22T15:48:19Z

    cleaning up

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to