[ 
https://issues.apache.org/jira/browse/FLINK-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14570965#comment-14570965
 ] 

ASF GitHub Bot commented on FLINK-2116:
---------------------------------------

GitHub user tillrohrmann opened a pull request:

    https://github.com/apache/flink/pull/772

    [FLINK-2116] [ml] Reusing predict operation for evaluation

    This PR adds an `evaluate` method to `Predictor` which takes a 
`DataSet[Testing]` and returns a `DataSet[(LabelType, LabelType)]`, where the 
first tuple field is the true label and the second field denotes the predicted 
label. The evaluation logic is defined via a `EvaluateDataSetOperation`.
    
    Since predicting test data and evaluate test data both use the same 
prediction logic, a new level  of abstraction was introduced. The old 
`PredictOperation` is now called `PredictDataSetOperation` and a new 
`PredictOperation` was defined. The `PredictOperation` takes an element of the 
dataset as well as the model of the associated `Predictor` and calculates one 
prediction.
    
    If one wants to implement the predict operation of a `Predictor` then one 
can do it on the level of `PredictDataSetOperation` which gives you access to 
the `DataSet` of input elements or on the level of `PredictOperation`. If one 
chooses the latter, then the system will automatically apply this operation to 
all elements of the input `DataSet` (see 
`Predictor.defaultPredictDataSetOperation`).
    
    Having defined a `PredictOperation` allows to automatically call `evaluate` 
for this `Predictor` without having to define a `EvaluateDataSetOperation`. The 
only constraint is that the input data has to be `DataSet[(TestingType, 
LabelType)]`. The input is thus a tuple with a testing value and the true label 
value. The system will then calculate the prediction for the testing value and 
return a `DataSet[(LabelType, LabelType)]` where the first field value of the 
tuple is the true label value and the second field value is the predicted label 
value.
    
    What do you think of these changes? Will they ease the development of 
future `Predictor`s?

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tillrohrmann/flink evaluatePredictor

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/772.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #772
    
----
commit 49c02514a6a23d7ef95ce46966ff7ee7a1f407ad
Author: Till Rohrmann <trohrm...@apache.org>
Date:   2015-06-02T12:34:27Z

    [FLINK-2116] [ml] Adds evaluate method to Predictor. Adds PredictOperation 
which can be reused by evaluate if the input data is of the format 
(TestingType, LabelType) where the second tuple field represents the true label.

----


> Make pipeline extension require less coding
> -------------------------------------------
>
>                 Key: FLINK-2116
>                 URL: https://issues.apache.org/jira/browse/FLINK-2116
>             Project: Flink
>          Issue Type: Improvement
>          Components: Machine Learning Library
>            Reporter: Mikio Braun
>            Assignee: Till Rohrmann
>            Priority: Minor
>
> Right now, implementing methods from the pipelines for new types, or even 
> adding new methods to pipelines requires many steps:
> 1) implementing methods for new types
>   implement implicit of the corresponding class encapsulating the operation 
> in the companion object
> 2) adding methods to the pipeline
>   - adding a method
>   - adding a trait for the operation
>   - implement implicit in the companion object
> These are all objects which contain many generic parameters, so reducing the 
> work would be great.
> The goal should be that you can really focus on the code to add, and have as 
> little boilerplate code as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to