Github user thvasilo commented on the issue: https://github.com/apache/flink/pull/2838 Hello Gabor, I like the idea of having a RankingScore, it seems like having that hierarchy with Score, RankingScore and PairWiseScore gives us the flexibility we need to include ranking and supervised learning evaluation under the same umbrella. I would also encourage sharing any other ideas you broached that might break the API, this is still very much an evolving project and there is no need to shoehorn everything into an `evaluate(test: TestType): DataSet[Double]` function if there are better alternatives. One think we need to consider is how this affects cross-validation and model selection/hyper-parameter tuning. These two aspects of the library are tightly linked and I think that we'll need to work on them in parallel to find issues that affect both. I recommend taking a look at the [cross-validation PR](https://github.com/apache/flink/pull/891) I had opened way back when, and make a new WIP PR that uses the current one (#2838) as a basis. Since the `Score` interface still exists it shouldn't require many changes, and all that's added is the CrossValidation class. There are other fundamental issues with the sampling there we can discuss in due time. Regarding the RankingPredictor we should consider the usecase of such an interface. Is it only going to be used for recommendation? If yes, what are the cases where we could build a Pipeline with current or future pre-processing steps? Could you give some pipeline examples in a recommendation setting?
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---