In fact, prec@k is similar to HR and ndcg@k is similar to ARHR After my study, I cannot find a best measure to evaluate recommendation system
Xiangrui, do you think it is reasonable to create a class to provide popular measures for evaluating recommendation system? Popular measures of recommendation system include precision, coverage, diversity… Most measures can be found in the book(Recommender_systems_handbook) 发件人: Xiangrui Meng [mailto:men...@gmail.com] 发送时间: 2014年8月26日 3:28 收件人: Lizhengbing (bing, BIPA) 抄送: dev@spark.apache.org 主题: Re: I want to contribute MLlib two quality measures(ARHR and HR) for top N recommendation system. Is this meaningful? The evaluation metrics are definitely useful. How do they differ from traditional IR metrics like prec@k and ndcg@k? -Xiangrui On Mon, Aug 25, 2014 at 2:14 AM, Lizhengbing (bing, BIPA) <zhengbing...@huawei.com<mailto:zhengbing...@huawei.com>> wrote: Hi: In paper “Item-Based Top-N Recommendation Algorithms”(https://stuyresearch.googlecode.com/hg/blake/resources/10.1.1.102.4451.pdf), there are two parameters measuring the quality of recommendation: HR and ARHR. If I use ALS(Implicit) for top-N recommendation system, I want to check it’s quality. ARHR and HR are two good quality measures. I want to contribute them to spark MLlib. So I want to know whether this is meaningful? (1) If n is the total number of customers/users, the hit-rate of the recommendation algorithm was computed as hit-rate (HR) = Number of hits / n (2)If h is the number of hits that occurred at positions p1, p2, . . . , ph within the top-N lists (i.e., 1 ≤ pi ≤ N), then the average reciprocal hit-rank is equal to: i .