[ https://issues.apache.org/jira/browse/FLINK-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14618458#comment-14618458 ]
ASF GitHub Bot commented on FLINK-2157: --------------------------------------- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/871#discussion_r34138534 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/pipeline/Predictor.scala --- @@ -72,12 +74,36 @@ trait Predictor[Self] extends Estimator[Self] with WithParameters { */ def evaluate[Testing, PredictionValue]( testing: DataSet[Testing], - evaluateParameters: ParameterMap = ParameterMap.Empty)(implicit - evaluator: EvaluateDataSetOperation[Self, Testing, PredictionValue]) + evaluateParameters: ParameterMap = ParameterMap.Empty) + (implicit evaluator: EvaluateDataSetOperation[Self, Testing, PredictionValue]) : DataSet[(PredictionValue, PredictionValue)] = { FlinkMLTools.registerFlinkMLTypes(testing.getExecutionEnvironment) evaluator.evaluateDataSet(this, evaluateParameters, testing) } + + /** Calculates a numerical score for the [[Predictor]] + * + * By convention, higher scores are considered better, so even if a loss is used as a performance + * measure, it will be negated, so that that higher is better. + * @param testing The evaluation DataSet, that contains the features and the true value + * @param evaluateOperation An EvaluateDataSetOperation that produces Double results + * @tparam Testing The type of the features and true value, for example [[LabeledVector]] + * @return A DataSet containing one Double that indicates the score of the predictor + */ + def score[Testing](testing: DataSet[Testing]) --- End diff -- I'm not so sure whether this function in its current form should be part of the `Predictor` interface. What we basically do is to say that all `Predictors` must have an `EvaluateDataSetOperation` and thus also an `PredictDataSetOperation` which produces a `Double` value. This might be true for many `Predictors`, but not for all. For example, imagine a classifier whose predicted classes are strings. > Create evaluation framework for ML library > ------------------------------------------ > > Key: FLINK-2157 > URL: https://issues.apache.org/jira/browse/FLINK-2157 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library > Reporter: Till Rohrmann > Assignee: Theodore Vasiloudis > Labels: ML > Fix For: 0.10 > > > Currently, FlinkML lacks means to evaluate the performance of trained models. > It would be great to add some {{Evaluators}} which can calculate some score > based on the information about true and predicted labels. This could also be > used for the cross validation to choose the right hyper parameters. > Possible scores could be F score [1], zero-one-loss score, etc. > Resources > [1] [http://en.wikipedia.org/wiki/F1_score] -- This message was sent by Atlassian JIRA (v6.3.4#6332)