The PR already exists for adding RankingEvaluator to ML - https://github.com/apache/spark/pull/12461. I need to revive and review it. DB, your review would be welcome too (and also on https://github.com/apache/spark/issues/12574 which has implications for the semantics of ranking metrics in the DataFrame style API).
Also see this discussion here - https://github.com/apache/spark/pull/12461#discussion-diff-60469791 - comment welcome. N On Mon, 19 Sep 2016 at 06:37 DB Tsai <dbt...@dbtsai.com> wrote: > Hi Jong, > > I think the definition from Kaggle is correct. I'm working on > implementing ranking metrics in Spark ML now, but the timeline is > unknown. Feel free to submit a PR for this in MLlib. > > Thanks. > > Sincerely, > > DB Tsai > ---------------------------------------------------------- > Web: https://www.dbtsai.com > PGP Key ID: 0xAF08DF8D > > > On Sun, Sep 18, 2016 at 8:42 PM, Jong Wook Kim <jongw...@nyu.edu> wrote: > > Hi, > > > > I'm trying to evaluate a recommendation model, and found that Spark and > > Rival give different results, and it seems that Rival's one is what > Kaggle > > defines: > https://gist.github.com/jongwook/5d4e78290eaef22cb69abbf68b52e597 > > > > Am I using RankingMetrics in a wrong way, or is Spark's implementation > > incorrect? > > > > To my knowledge, NDCG should be dependent on the relevance (or > preference) > > values, but Spark's implementation seems not; it uses 1.0 where it > should be > > 2^(relevance) - 1, probably assuming that relevance is all 1.0? I also > tried > > tweaking, but its method to obtain the ideal DCG also seems wrong. > > > > Any feedback from MLlib developers would be appreciated. I made a > > modified/extended version of RankingMetrics that produces the identical > > numbers to Kaggle and Rival's results, and I'm wondering if it is > something > > appropriate to be added back to MLlib. > > > > Jong Wook > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >