GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/772
[FLINK-2116] [ml] Reusing predict operation for evaluation This PR adds an `evaluate` method to `Predictor` which takes a `DataSet[Testing]` and returns a `DataSet[(LabelType, LabelType)]`, where the first tuple field is the true label and the second field denotes the predicted label. The evaluation logic is defined via a `EvaluateDataSetOperation`. Since predicting test data and evaluate test data both use the same prediction logic, a new level of abstraction was introduced. The old `PredictOperation` is now called `PredictDataSetOperation` and a new `PredictOperation` was defined. The `PredictOperation` takes an element of the dataset as well as the model of the associated `Predictor` and calculates one prediction. If one wants to implement the predict operation of a `Predictor` then one can do it on the level of `PredictDataSetOperation` which gives you access to the `DataSet` of input elements or on the level of `PredictOperation`. If one chooses the latter, then the system will automatically apply this operation to all elements of the input `DataSet` (see `Predictor.defaultPredictDataSetOperation`). Having defined a `PredictOperation` allows to automatically call `evaluate` for this `Predictor` without having to define a `EvaluateDataSetOperation`. The only constraint is that the input data has to be `DataSet[(TestingType, LabelType)]`. The input is thus a tuple with a testing value and the true label value. The system will then calculate the prediction for the testing value and return a `DataSet[(LabelType, LabelType)]` where the first field value of the tuple is the true label value and the second field value is the predicted label value. What do you think of these changes? Will they ease the development of future `Predictor`s? You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink evaluatePredictor Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/772.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #772 ---- commit 49c02514a6a23d7ef95ce46966ff7ee7a1f407ad Author: Till Rohrmann <trohrm...@apache.org> Date: 2015-06-02T12:34:27Z [FLINK-2116] [ml] Adds evaluate method to Predictor. Adds PredictOperation which can be reused by evaluate if the input data is of the format (TestingType, LabelType) where the second tuple field represents the true label. ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---