You can use rdd.cartesian then find top-k by key to distribute the
work to executors. There is a trick to boost the performance: you need
to blockify user/product features and then use native matrix-matrix
multiplication. There is a relevant PR from Deb:
https://github.com/apache/spark/pull/3098 .
Hi.
We are using the ALS model, and would like to get all users and items
scored.
currently we have these methods.
https://gist.github.com/hamnis/e396854f4654bd46ebe0
We want to be able to distribute the calculations to the slaves so we dont
have to do this on the master.
Is there an efficient