GitHub user jkbradley opened a pull request:

    https://github.com/apache/spark/pull/7909

    [SPARK-8069] [ML] Add multiclass thresholds for ProbabilisticClassifier

    This PR replaces the old "threshold" with a generalized "thresholds" Param. 
 We keep getThreshold,setThreshold for backwards compatibility for binary 
classification.
    
    CC: @holdenk 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jkbradley/spark 
holdenk-SPARK-8069-add-cutoff-aka-threshold-to-random-forest

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/7909.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #7909
    
----
commit 0ef228c68eb7f22f19dd4f529ea88def2a897b63
Author: Holden Karau <[email protected]>
Date:   2015-07-01T22:26:29Z

    Add hasthresholds

commit 31d6bf2e2228e2b33a73d0d6148f033facc1c292
Author: Holden Karau <[email protected]>
Date:   2015-07-01T23:05:24Z

    Start threading the threshold info through

commit 1fed644eff52fdd6501b4eaf30450ea6f4dad327
Author: Holden Karau <[email protected]>
Date:   2015-07-02T00:08:08Z

    Use thresholds to scale scores in random forest classifcation

commit 5d999d233feb2ae7497fe37e2c5ff0fc8b6e9d23
Author: Holden Karau <[email protected]>
Date:   2015-07-02T01:06:51Z

    Some more progress, start adding a test (maybe try and see if we can find a 
better thing to use for the base of the test)

commit a7d59c8f2e8e90efd6ec6129857db843fa75cfd9
Author: Holden Karau <[email protected]>
Date:   2015-07-02T23:03:29Z

    Move thresholding into Classifier trait

commit f70eb5e306174925c69a74b6a56dbedabaea3b80
Author: Holden Karau <[email protected]>
Date:   2015-07-02T23:44:15Z

    Fix test compile issues

commit 0f468368d2fe0008330c191d17d97fa94a38094d
Author: Holden Karau <[email protected]>
Date:   2015-07-02T23:57:16Z

    Start adding a classifiersuite

commit 099c0f364725e849429d4551aa8576ecaf738e18
Author: Holden Karau <[email protected]>
Date:   2015-07-03T00:23:59Z

    Move thresholds around some more (set on model not trainer)

commit 85c9e01fc960ab343a857f63643bf9f7f1aa16e5
Author: Holden Karau <[email protected]>
Date:   2015-07-03T01:14:51Z

    Test passes again... little fnur

commit 634b06f79eb340170980ee8c6952e57373faf249
Author: Holden Karau <[email protected]>
Date:   2015-07-03T01:34:03Z

    Some progress towards unifying threshold and thresholds

commit f338cfc786e5d5185318195a6a8b2ee34906ce52
Author: Holden Karau <[email protected]>
Date:   2015-07-03T01:34:08Z

    Wait that wasn't a good idea, Revert "Some progress towards unifying 
threshold and thresholds"
    
    This reverts commit f8538a65a265e86724922aa63b0cc602a3c7603f.

commit 2f44b187e71f6b77521ab284cf32e52c3447e58c
Author: Holden Karau <[email protected]>
Date:   2015-07-03T01:38:46Z

    Add a global default of null for thresholds param

commit 1986fa8571848d899cec21afba951f32032e1142
Author: Holden Karau <[email protected]>
Date:   2015-07-03T01:42:39Z

    Setting the thresholds only makes sense if the underlying class hasn't 
overridden predict, so lets push it down.

commit 74f54c3e00dcc770d58b92e38cfbc3499a5644b0
Author: Holden Karau <[email protected]>
Date:   2015-07-03T02:06:40Z

    Fix creation of vote array

commit 6b34809ecee4954866446b3ad6e0fbf96dbe0c48
Author: Holden Karau <[email protected]>
Date:   2015-07-03T02:07:01Z

    Add a test with thresholding for the RFCS

commit efb90840fb475743314680458030f758c075a96e
Author: Holden Karau <[email protected]>
Date:   2015-07-03T02:13:43Z

    move setThresholds only to where its used

commit 1f09a2e28d16a1033f882c3e4ba631710b854b52
Author: Holden Karau <[email protected]>
Date:   2015-07-03T02:32:13Z

    try and hide threshold but chainges the API so no dice there

commit 1433e52db1252b640ea8b21475513ccb60e7f0de
Author: Holden Karau <[email protected]>
Date:   2015-07-03T02:32:19Z

    Revert "try and hide threshold but chainges the API so no dice there"
    
    This reverts commit 90ef80fbc7b1b55bdce24c107a0a30603ceb6f9a.

commit 978e77a3b299f7d4364f287bfc4d9f31f54409c3
Author: Holden Karau <[email protected]>
Date:   2015-07-06T19:15:53Z

    Move HasThreshold into classifier params and start defining the overloaded 
getThreshold/getThresholds functions

commit 0420290481df74401f2e0848e7519648c6aca2b9
Author: Holden Karau <[email protected]>
Date:   2015-07-07T03:10:13Z

    Allow us to override the get methods selectively

commit ffc8dab085f62f374c045ce637a2da40898d967d
Author: Holden Karau <[email protected]>
Date:   2015-07-07T03:15:24Z

    Update the sharedParams

commit 6f14314ece5875e1bc9cf4f88b39b5aa809aa9d6
Author: Holden Karau <[email protected]>
Date:   2015-07-07T03:34:55Z

    Since hasthreshold/hasthresholds is in root classifier now

commit a0f3b0c634b5d5a41f721dd5cb273e33c0ef4363
Author: Holden Karau <[email protected]>
Date:   2015-07-07T03:46:38Z

    scala style fixes

commit 3456ed3eaa772190c69e2e5034453802e5f3456a
Author: Holden Karau <[email protected]>
Date:   2015-07-07T04:09:05Z

    Add explicit return types even though just test

commit 8d92cac732b268e01823c8a927781285919ec454
Author: Holden Karau <[email protected]>
Date:   2015-07-07T04:44:07Z

    Use ClassifierParams as the head

commit e09919ce5f5a440a94ed6ec036c665c3e067a6c7
Author: Holden Karau <[email protected]>
Date:   2015-07-07T21:03:28Z

    Fix return type, I need more coffee....

commit 638854cd2e23b7e16686b98a00d1813eb59d2c7f
Author: Holden Karau <[email protected]>
Date:   2015-07-07T23:11:21Z

    Add a scala RandomForestClassifierSuite test based on corresponding python 
test

commit 4893bdc78be8fac5245b0dbc670d9b5581863838
Author: Holden Karau <[email protected]>
Date:   2015-07-07T23:14:17Z

    Use numtrees of 3 since previous result was tied (one tree for each) and 
the switch from different max methods picked a different element (since they 
were equal I think this is ok)

commit 398078a6ee9ccb5502588d42082f70f257a0ca0d
Author: Holden Karau <[email protected]>
Date:   2015-08-01T22:47:54Z

    move the thresholding around a bunch based on the design doc

commit adf15b450a7963b363432cd76fda71cdf081a8c9
Author: Holden Karau <[email protected]>
Date:   2015-08-01T22:49:21Z

    rename the classifier suite test to ProbabilisticClassifierSuite now that 
we only have it in Probabilistic

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to