Without any tricks, yes you have to do this much work to really know which
are the largest values in UM' for every row. There's not an obvious twist
that speeds it up.

(Do you really want to compute all user recommendations? how many of the
2.6M are likely to be active soon, or, ever?)

First, usually it's only a subset of all items that are recommendable
anyway. You don't want them out of the model but don't need to consider
them. This is domain specific of course, but, if 90% of the items are "out
of stock" or something, of course you can not bother to score them in the
first place

Yes, LSH is exactly what I do as well. You hash the item feature vectors
into buckets and then only iterate over nearby buckets to find candidates.
You can avoid looking at 90+% of candidates this way without much if any
impact on top N.

Pruning is indeed third on the list but usually you get the problem to a
pretty good size from the points above.



On Tue, Mar 5, 2013 at 9:15 PM, Josh Devins <[email protected]> wrote:

> Hi all,
>
> I have a conceptually simple problem. A user-item matrix, A, whose
> dimensions are ~2.6M rows x ~2.8M cols (~65M non-zeros). Running ALS with
> 20 features reduces this in the usual way to A = UM'. Trying to generate
> top-n (where n=100) recommendations for all users in U is quite a long
> process though. Essentially, for every user, it's generating a prediction
> for all unrated items in M then taking the top-n (all in-memory). I'm using
> the standard ALS `RecommenderJob` for this.
>
> Considering that there are ~2.6M users and ~2.8M items, this is a really,
> really, time consuming way to find the top-n recommendations for all users
> in U. I feel like there could be a tricky way to avoid having to compute
> all item predictions of a user though. I can't find any reference in papers
> about improving this but at the moment, the estimate (with 10 mappers
> running the `RecommenderJob`) is ~80 hours. When I think about this problem
> I wonder if applying kNN or local sensitive min-hashing would somehow help
> me. Basically find the nearest neighbours directly and calculate
> predictions on those items only and not every item in M. On the flip side,
> I could start to reduce the item space, since it's quite large, basically
> start removing items that have low in-degrees since these probably don't
> contribute too much to the final recommendations. I don't like this so much
> though as it could remove some of the long-tail recommendations. At least,
> that is my intuition :)
>
> Thoughts anyone?
>
> Thanks in advance,
>
> Josh
>

Reply via email to