yuhao yang created SPARK-18704:
----------------------------------
Summary: CrossValidator should preserve more tuning statistics
Key: SPARK-18704
URL: https://issues.apache.org/jira/browse/SPARK-18704
Project: Spark
Issue Type: Improvement
Components: ML
Reporter: yuhao yang
Priority: Minor
Currently CrossValidator will train (k-fold * paramMaps) different models
during the training process, yet it only passes the average metrics to
CrossValidatorModel. From which some important information like variances for
the same paramMap cannot be retrieved, and users cannot be sure if the k number
is proper. Since the CrossValidator is relatively expensive, we probably want
to get the most from the tuning process.
Just want to see if this sounds good. In my opinion, this can be done either by
passing a metrics matrix to the CrossValidatorModel, or we can introduce a
CrossValidatorSummary. I would vote for introducing the TunningSummary class,
which can also be used by TrainValidationSplit. In the summary we can present a
better statistics for the tuning process. Something like a DataFrame:
+---------------+------------+--------+-----------------+
|elasticNetParam|fitIntercept|regParam|metrics |
+---------------+------------+--------+-----------------+
|0.0 |true |0.1 |9.747795248932505|
|0.0 |true |0.01 |9.751942357398603|
|0.0 |false |0.1 |9.71727627087487 |
|0.0 |false |0.01 |9.721149803723822|
|0.5 |true |0.1 |9.719358515436005|
|0.5 |true |0.01 |9.748121645368501|
|0.5 |false |0.1 |9.687771328829479|
|0.5 |false |0.01 |9.717304811419261|
|1.0 |true |0.1 |9.696769467196487|
|1.0 |true |0.01 |9.744325276259957|
|1.0 |false |0.1 |9.665822167122172|
|1.0 |false |0.01 |9.713484065511892|
+---------------+------------+--------+-----------------+
Using the dataFrame, users can better understand the effect of different
parameters.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]