[ 
https://issues.apache.org/jira/browse/FLINK-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15683167#comment-15683167
 ] 

ASF GitHub Bot commented on FLINK-4712:
---------------------------------------

GitHub user gaborhermann opened a pull request:

    https://github.com/apache/flink/pull/2838

    [FLINK-4712] [FLINK-4713] [ml] Ranking recommendation & evaluation (WIP)

    Please note that this is a work-in-progress PR for discussing API design 
decisions. We propose here a class hierarchy for fitting the ranking 
evaluations into the proposed evaluation framework (see 
[PR](https://github.com/apache/flink/pull/1849)).
    The features are mostly working, but documentation is missing and minor 
refactoring is needed. The evaluations currently work with top 100 rankings 
(burnt-in), and we still need to fix that. We need feedback for two main 
solutions, so we can go on with the PR. Thanks for any comment!
    
    ### `RankingPredictor`
    
    We have managed to rework the evaluation framework proposed by @thvasilo, 
so that ranking predictions would fit in. Our approach is to use separate 
`RankingPredictor` and `Predictor` traits. One main problem however remains: 
there is no common superclass for `RankingPredictor` and `Predictor` so the 
pipelining mechanism might not work. A `Predictor` can only be at the and of 
the pipeline, so this should not really be a problem, but I do not know for 
sure. An alternative solution would be to have different objects `ALS` and 
`RankingALS` that give different predictions, but both extends only a 
`Predictor`. There could be implicit conversions between the two. I would 
prefer the current solution if it does not break the pipelining. @thvasilo what 
do you think about this?
    
    (This seems to be a problem similar to having a `predict_proba` function in 
scikit learn classification models, where the same model for the same input 
gives two different predictions: a `predict` for discrete predictions and 
`predict_proba` for giving a probability.)
    
    ### Generalizing `EvalutateDataSetOperation`
    
    On the other hand, we seem to have solved the scoring issue. The users can 
evaluate a recommendation algorithm such as ALS by using a score operating on 
rankings (e.g. nDCG), or a score operating on ratings (e.g. RMSE). They only 
need to modify the `Score` they use in their code, and nothing else.
    
    The main problem was that the evaluate method and 
`EvaluateDataSetOperation` were not general enough. They prepare the evaluation 
to `(trueValue, predictedValue)` pairs (i.e. a `DataSet[(PredictionType, 
PredictionType)]`), while ranking evaluations needed a more general input with 
the true ratings (`DataSet[(Int,Int,Double)]`) and the predicted rankings 
(`DataSet[(Int,Int,Int)]`).
    
    Instead of using `EvaluateDataSetOperation` we use a more general 
`PrepareOperation`. We rename the `Score` in the original evaluation framework 
to `PairwiseScore`. `RankingScore` and `PairwiseScore` has a common trait 
`Score`. This way the user can use both a `RankingScore` and a `PairwiseScore` 
for a certain model, and only need to alter the score used in the code.
    
    In case of pairwise scores (that only need true and predicted value pairs 
for evaluation) `EvaluateDataSetOperation` is used as a `PrepareOperation`. It 
prepares the evaluation by creating `(trueValue, predicitedValue)` pairs from 
the test dataset. Thus, the result of preparing and the input of 
`PairwiseScore`s will be `DataSet[(PredictionType,PredictionType)]`. In case of 
rankings the `PrepareOperation` passes the test dataset and creates the 
rankings. The result of preparing and the input of `RankingScore`s will be 
`(DataSet[Int,Int,Double], DataSet[Int,Int,Int])`. I believe this is a fairly 
acceptable solution that avoids breaking the API.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gaborhermann/flink ranking-rec-eval

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/2838.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2838
    
----

----


> Implementing ranking predictions for ALS
> ----------------------------------------
>
>                 Key: FLINK-4712
>                 URL: https://issues.apache.org/jira/browse/FLINK-4712
>             Project: Flink
>          Issue Type: New Feature
>          Components: Machine Learning Library
>            Reporter: Domokos Miklós Kelen
>            Assignee: Gábor Hermann
>
> We started working on implementing ranking predictions for recommender 
> systems. Ranking prediction means that beside predicting scores for user-item 
> pairs, the recommender system is able to recommend a top K list for the users.
> Details:
> In practice, this would mean finding the K items for a particular user with 
> the highest predicted rating. It should be possible also to specify whether 
> to exclude the already seen items from a particular user's toplist. (See for 
> example the 'exclude_known' setting of [Graphlab Create's ranking 
> factorization 
> recommender|https://turi.com/products/create/docs/generated/graphlab.recommender.ranking_factorization_recommender.RankingFactorizationRecommender.recommend.html#graphlab.recommender.ranking_factorization_recommender.RankingFactorizationRecommender.recommend]
>  ).
> The output of the topK recommendation function could be in the form of 
> {{DataSet[(Int,Int,Int)]}}, meaning (user, item, rank), similar to Graphlab 
> Create's output. However, this is arguable: follow up work includes 
> implementing ranking recommendation evaluation metrics (such as precision@k, 
> recall@k, ndcg@k), similar to [Spark's 
> implementations|https://spark.apache.org/docs/1.5.0/mllib-evaluation-metrics.html#ranking-systems].
>  It would be beneficial if we were able to design the API such that it could 
> be included in the proposed evaluation framework (see 
> [5157|https://issues.apache.org/jira/browse/FLINK-2157]), which makes it 
> neccessary to consider the possible output type {{DataSet[(Int, 
> Array[Int])]}} or {{DataSet[(Int, Array[(Int,Double)])]}} meaning (user, 
> array of items), possibly including the predicted scores as well. See 
> [4713|https://issues.apache.org/jira/browse/FLINK-4713] for details.
> Another question arising is whether to provide this function as a member of 
> the ALS class, as a switch-kind of parameter to the ALS implementation 
> (meaning the model is either a rating or a ranking recommender model) or in 
> some other way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to