[ 
https://issues.apache.org/jira/browse/FLINK-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572386#comment-14572386
 ] 

Till Rohrmann commented on FLINK-2116:
--------------------------------------

At the moment, the corresponding PR only contains the {{evaluate}} method which 
gives you a {{DataSet}} of tuples {{(true label, predicted label)}}. This can 
then be used to calculate some accuracy scores. But this has not been done yet. 

With this PR I wanted to get some feedback on the general design of the 
pipelines with the {{evaluate}} method and whether it makes sense to use 
{{Tuples}} as input instead of {{LabeledVector}}. Maybe there is also some 
other way to automatically extract a label value from some type which is 
parameterized to make the default {{EvaluateDataSetOperation}} work on 
{{LabeledVector}} if you only specify a {{PredictOperation}}.

My gut feeling is also that we should keep the calculation of the evaluation 
score separate from the actual {{Predictor}}, because if you have a pipeline, 
then it's no longer easy to access the members of the {{Predictor}} which are 
only defined in the corresponding subclass. Moreover, maybe sometimes you want 
to apply different scores to your method depending on the use case.

We should definitely open a new JIRA issue for the implementation of an 
evaluation framework.

> Make pipeline extension require less coding
> -------------------------------------------
>
>                 Key: FLINK-2116
>                 URL: https://issues.apache.org/jira/browse/FLINK-2116
>             Project: Flink
>          Issue Type: Improvement
>          Components: Machine Learning Library
>            Reporter: Mikio Braun
>            Assignee: Till Rohrmann
>            Priority: Minor
>
> Right now, implementing methods from the pipelines for new types, or even 
> adding new methods to pipelines requires many steps:
> 1) implementing methods for new types
>   implement implicit of the corresponding class encapsulating the operation 
> in the companion object
> 2) adding methods to the pipeline
>   - adding a method
>   - adding a trait for the operation
>   - implement implicit in the companion object
> These are all objects which contain many generic parameters, so reducing the 
> work would be great.
> The goal should be that you can really focus on the code to add, and have as 
> little boilerplate code as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to