GitHub user jkbradley opened a pull request:
https://github.com/apache/spark/pull/7909
[SPARK-8069] [ML] Add multiclass thresholds for ProbabilisticClassifier
This PR replaces the old "threshold" with a generalized "thresholds" Param.
We keep getThreshold,setThreshold for backwards compatibility for binary
classification.
CC: @holdenk
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jkbradley/spark
holdenk-SPARK-8069-add-cutoff-aka-threshold-to-random-forest
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/7909.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #7909
----
commit 0ef228c68eb7f22f19dd4f529ea88def2a897b63
Author: Holden Karau <[email protected]>
Date: 2015-07-01T22:26:29Z
Add hasthresholds
commit 31d6bf2e2228e2b33a73d0d6148f033facc1c292
Author: Holden Karau <[email protected]>
Date: 2015-07-01T23:05:24Z
Start threading the threshold info through
commit 1fed644eff52fdd6501b4eaf30450ea6f4dad327
Author: Holden Karau <[email protected]>
Date: 2015-07-02T00:08:08Z
Use thresholds to scale scores in random forest classifcation
commit 5d999d233feb2ae7497fe37e2c5ff0fc8b6e9d23
Author: Holden Karau <[email protected]>
Date: 2015-07-02T01:06:51Z
Some more progress, start adding a test (maybe try and see if we can find a
better thing to use for the base of the test)
commit a7d59c8f2e8e90efd6ec6129857db843fa75cfd9
Author: Holden Karau <[email protected]>
Date: 2015-07-02T23:03:29Z
Move thresholding into Classifier trait
commit f70eb5e306174925c69a74b6a56dbedabaea3b80
Author: Holden Karau <[email protected]>
Date: 2015-07-02T23:44:15Z
Fix test compile issues
commit 0f468368d2fe0008330c191d17d97fa94a38094d
Author: Holden Karau <[email protected]>
Date: 2015-07-02T23:57:16Z
Start adding a classifiersuite
commit 099c0f364725e849429d4551aa8576ecaf738e18
Author: Holden Karau <[email protected]>
Date: 2015-07-03T00:23:59Z
Move thresholds around some more (set on model not trainer)
commit 85c9e01fc960ab343a857f63643bf9f7f1aa16e5
Author: Holden Karau <[email protected]>
Date: 2015-07-03T01:14:51Z
Test passes again... little fnur
commit 634b06f79eb340170980ee8c6952e57373faf249
Author: Holden Karau <[email protected]>
Date: 2015-07-03T01:34:03Z
Some progress towards unifying threshold and thresholds
commit f338cfc786e5d5185318195a6a8b2ee34906ce52
Author: Holden Karau <[email protected]>
Date: 2015-07-03T01:34:08Z
Wait that wasn't a good idea, Revert "Some progress towards unifying
threshold and thresholds"
This reverts commit f8538a65a265e86724922aa63b0cc602a3c7603f.
commit 2f44b187e71f6b77521ab284cf32e52c3447e58c
Author: Holden Karau <[email protected]>
Date: 2015-07-03T01:38:46Z
Add a global default of null for thresholds param
commit 1986fa8571848d899cec21afba951f32032e1142
Author: Holden Karau <[email protected]>
Date: 2015-07-03T01:42:39Z
Setting the thresholds only makes sense if the underlying class hasn't
overridden predict, so lets push it down.
commit 74f54c3e00dcc770d58b92e38cfbc3499a5644b0
Author: Holden Karau <[email protected]>
Date: 2015-07-03T02:06:40Z
Fix creation of vote array
commit 6b34809ecee4954866446b3ad6e0fbf96dbe0c48
Author: Holden Karau <[email protected]>
Date: 2015-07-03T02:07:01Z
Add a test with thresholding for the RFCS
commit efb90840fb475743314680458030f758c075a96e
Author: Holden Karau <[email protected]>
Date: 2015-07-03T02:13:43Z
move setThresholds only to where its used
commit 1f09a2e28d16a1033f882c3e4ba631710b854b52
Author: Holden Karau <[email protected]>
Date: 2015-07-03T02:32:13Z
try and hide threshold but chainges the API so no dice there
commit 1433e52db1252b640ea8b21475513ccb60e7f0de
Author: Holden Karau <[email protected]>
Date: 2015-07-03T02:32:19Z
Revert "try and hide threshold but chainges the API so no dice there"
This reverts commit 90ef80fbc7b1b55bdce24c107a0a30603ceb6f9a.
commit 978e77a3b299f7d4364f287bfc4d9f31f54409c3
Author: Holden Karau <[email protected]>
Date: 2015-07-06T19:15:53Z
Move HasThreshold into classifier params and start defining the overloaded
getThreshold/getThresholds functions
commit 0420290481df74401f2e0848e7519648c6aca2b9
Author: Holden Karau <[email protected]>
Date: 2015-07-07T03:10:13Z
Allow us to override the get methods selectively
commit ffc8dab085f62f374c045ce637a2da40898d967d
Author: Holden Karau <[email protected]>
Date: 2015-07-07T03:15:24Z
Update the sharedParams
commit 6f14314ece5875e1bc9cf4f88b39b5aa809aa9d6
Author: Holden Karau <[email protected]>
Date: 2015-07-07T03:34:55Z
Since hasthreshold/hasthresholds is in root classifier now
commit a0f3b0c634b5d5a41f721dd5cb273e33c0ef4363
Author: Holden Karau <[email protected]>
Date: 2015-07-07T03:46:38Z
scala style fixes
commit 3456ed3eaa772190c69e2e5034453802e5f3456a
Author: Holden Karau <[email protected]>
Date: 2015-07-07T04:09:05Z
Add explicit return types even though just test
commit 8d92cac732b268e01823c8a927781285919ec454
Author: Holden Karau <[email protected]>
Date: 2015-07-07T04:44:07Z
Use ClassifierParams as the head
commit e09919ce5f5a440a94ed6ec036c665c3e067a6c7
Author: Holden Karau <[email protected]>
Date: 2015-07-07T21:03:28Z
Fix return type, I need more coffee....
commit 638854cd2e23b7e16686b98a00d1813eb59d2c7f
Author: Holden Karau <[email protected]>
Date: 2015-07-07T23:11:21Z
Add a scala RandomForestClassifierSuite test based on corresponding python
test
commit 4893bdc78be8fac5245b0dbc670d9b5581863838
Author: Holden Karau <[email protected]>
Date: 2015-07-07T23:14:17Z
Use numtrees of 3 since previous result was tied (one tree for each) and
the switch from different max methods picked a different element (since they
were equal I think this is ok)
commit 398078a6ee9ccb5502588d42082f70f257a0ca0d
Author: Holden Karau <[email protected]>
Date: 2015-08-01T22:47:54Z
move the thresholding around a bunch based on the design doc
commit adf15b450a7963b363432cd76fda71cdf081a8c9
Author: Holden Karau <[email protected]>
Date: 2015-08-01T22:49:21Z
rename the classifier suite test to ProbabilisticClassifierSuite now that
we only have it in Probabilistic
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]