Re: Efficient way of scoring all items and users in an ALS model

2015-02-23 Thread Xiangrui Meng
You can use rdd.cartesian then find top-k by key to distribute the work to executors. There is a trick to boost the performance: you need to blockify user/product features and then use native matrix-matrix multiplication. There is a relevant PR from Deb: https://github.com/apache/spark/pull/3098 .

Efficient way of scoring all items and users in an ALS model

2015-02-23 Thread Erlend Hamnaberg
Hi. We are using the ALS model, and would like to get all users and items scored. currently we have these methods. https://gist.github.com/hamnis/e396854f4654bd46ebe0 We want to be able to distribute the calculations to the slaves so we dont have to do this on the master. Is there an efficient