GitHub user tillrohrmann opened a pull request:

    https://github.com/apache/flink/pull/772

    [FLINK-2116] [ml] Reusing predict operation for evaluation

    This PR adds an `evaluate` method to `Predictor` which takes a 
`DataSet[Testing]` and returns a `DataSet[(LabelType, LabelType)]`, where the 
first tuple field is the true label and the second field denotes the predicted 
label. The evaluation logic is defined via a `EvaluateDataSetOperation`.
    
    Since predicting test data and evaluate test data both use the same 
prediction logic, a new level  of abstraction was introduced. The old 
`PredictOperation` is now called `PredictDataSetOperation` and a new 
`PredictOperation` was defined. The `PredictOperation` takes an element of the 
dataset as well as the model of the associated `Predictor` and calculates one 
prediction.
    
    If one wants to implement the predict operation of a `Predictor` then one 
can do it on the level of `PredictDataSetOperation` which gives you access to 
the `DataSet` of input elements or on the level of `PredictOperation`. If one 
chooses the latter, then the system will automatically apply this operation to 
all elements of the input `DataSet` (see 
`Predictor.defaultPredictDataSetOperation`).
    
    Having defined a `PredictOperation` allows to automatically call `evaluate` 
for this `Predictor` without having to define a `EvaluateDataSetOperation`. The 
only constraint is that the input data has to be `DataSet[(TestingType, 
LabelType)]`. The input is thus a tuple with a testing value and the true label 
value. The system will then calculate the prediction for the testing value and 
return a `DataSet[(LabelType, LabelType)]` where the first field value of the 
tuple is the true label value and the second field value is the predicted label 
value.
    
    What do you think of these changes? Will they ease the development of 
future `Predictor`s?

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tillrohrmann/flink evaluatePredictor

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/772.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #772
    
----
commit 49c02514a6a23d7ef95ce46966ff7ee7a1f407ad
Author: Till Rohrmann <trohrm...@apache.org>
Date:   2015-06-02T12:34:27Z

    [FLINK-2116] [ml] Adds evaluate method to Predictor. Adds PredictOperation 
which can be reused by evaluate if the input data is of the format 
(TestingType, LabelType) where the second tuple field represents the true label.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to