Github user thvasilo commented on the issue: https://github.com/apache/flink/pull/2838 > The problem is not with the evaluate(test: TestType): DataSet[Double] but rather with evaluate(test: TestType): DataSet[(Prediction,Prediction)]. Completely agree there, I advocated for removing/renaming the evaluate function, we considered using a `score` function for a more sklearn-like approach before, see e.g. #902. Having _some_ function that returns a `DataSet[(truth: Prediction,pred: Prediction)]` is useful and probably necessary, but we should look at alternatives as the current state is confusing. I think I like the approach you are suggesting, so feel free to come up with an alternative in the WIP PRs. Getting rid of the Pipeline requirements for recommendation algorithms would simplify some things. In that case we'll have to re-evaluate if it makes sense for them to implement the `Predictor` interface at all, or maybe we have `ChainablePredictors` but I think our hierarchy is deep enough already.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---