GitHub user sethah opened a pull request:
https://github.com/apache/spark/pull/11903
[SPARK-13952][ML] Add random seed to GBT
## What changes were proposed in this pull request?
`GBTClassifier` and `GBTRegressor` should use random seed for reproducible
results. Because of the nature of current unit tests, which compare GBTs in ML
and GBTs in MLlib for equality, I also added a random seed to MLlib GBT
algorithm. I made alternate constructors in `mllib.tree.GradientBoostedTrees`
to accept a random seed, but left them as private so as to not change the API
unnecessarily.
## How was this patch tested?
Existing unit tests verify that functionality did not change. Other ML
algorithms do not seem to have unit tests that directly test the functionality
of random seeding, but reproducibility with seeding for GBTs is effectively
verified in existing tests. I can add more tests if needed.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sethah/spark SPARK-13952
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/11903.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #11903
----
commit afea147f82c4c57fe2c2988f8b6cdcb918d6675a
Author: sethah <[email protected]>
Date: 2016-03-21T22:00:37Z
add seed to GBT
commit 84e2da64e27585c4cda8ac12b0aed57280f26ebe
Author: sethah <[email protected]>
Date: 2016-03-21T23:31:06Z
adding seed to MLlib decision trees
commit f71492eb03d9f3e877598dbc2cbabbbbd72186e8
Author: sethah <[email protected]>
Date: 2016-03-22T00:13:57Z
make constructors private
commit 62aa91d6e9cbaa50df40260e33c35810faf997ff
Author: sethah <[email protected]>
Date: 2016-03-22T15:48:19Z
cleaning up
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]