Re: Is RankingMetrics' NDCG implementation correct?

Nick Pentreath Sun, 18 Sep 2016 23:48:51 -0700

The PR already exists for adding RankingEvaluator to ML -
https://github.com/apache/spark/pull/12461. I need to revive and review it.
DB, your review would be welcome too (and also on
https://github.com/apache/spark/issues/12574 which has implications for the
semantics of ranking metrics in the DataFrame style API).


Also see this discussion here -
https://github.com/apache/spark/pull/12461#discussion-diff-60469791 -
comment welcome.

N

On Mon, 19 Sep 2016 at 06:37 DB Tsai <dbt...@dbtsai.com> wrote:

> Hi Jong,
>
> I think the definition from Kaggle is correct. I'm working on
> implementing ranking metrics in Spark ML now, but the timeline is
> unknown. Feel free to submit a PR for this in MLlib.
>
> Thanks.
>
> Sincerely,
>
> DB Tsai
> ----------------------------------------------------------
> Web: https://www.dbtsai.com
> PGP Key ID: 0xAF08DF8D
>
>
> On Sun, Sep 18, 2016 at 8:42 PM, Jong Wook Kim <jongw...@nyu.edu> wrote:
> > Hi,
> >
> > I'm trying to evaluate a recommendation model, and found that Spark and
> > Rival give different results, and it seems that Rival's one is what
> Kaggle
> > defines:
> https://gist.github.com/jongwook/5d4e78290eaef22cb69abbf68b52e597
> >
> > Am I using RankingMetrics in a wrong way, or is Spark's implementation
> > incorrect?
> >
> > To my knowledge, NDCG should be dependent on the relevance (or
> preference)
> > values, but Spark's implementation seems not; it uses 1.0 where it
> should be
> > 2^(relevance) - 1, probably assuming that relevance is all 1.0? I also
> tried
> > tweaking, but its method to obtain the ideal DCG also seems wrong.
> >
> > Any feedback from MLlib developers would be appreciated. I made a
> > modified/extended version of RankingMetrics that produces the identical
> > numbers to Kaggle and Rival's results, and I'm wondering if it is
> something
> > appropriate to be added back to MLlib.
> >
> > Jong Wook
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Re: Is RankingMetrics' NDCG implementation correct?

Reply via email to