Hi all,

I have a conceptually simple problem. A user-item matrix, A, whose
dimensions are ~2.6M rows x ~2.8M cols (~65M non-zeros). Running ALS with
20 features reduces this in the usual way to A = UM'. Trying to generate
top-n (where n=100) recommendations for all users in U is quite a long
process though. Essentially, for every user, it's generating a prediction
for all unrated items in M then taking the top-n (all in-memory). I'm using
the standard ALS `RecommenderJob` for this.

Considering that there are ~2.6M users and ~2.8M items, this is a really,
really, time consuming way to find the top-n recommendations for all users
in U. I feel like there could be a tricky way to avoid having to compute
all item predictions of a user though. I can't find any reference in papers
about improving this but at the moment, the estimate (with 10 mappers
running the `RecommenderJob`) is ~80 hours. When I think about this problem
I wonder if applying kNN or local sensitive min-hashing would somehow help
me. Basically find the nearest neighbours directly and calculate
predictions on those items only and not every item in M. On the flip side,
I could start to reduce the item space, since it's quite large, basically
start removing items that have low in-degrees since these probably don't
contribute too much to the final recommendations. I don't like this so much
though as it could remove some of the long-tail recommendations. At least,
that is my intuition :)

Thoughts anyone?

Thanks in advance,

Josh

Reply via email to