Here we go: https://issues.apache.org/jira/browse/SPARK-4823
On Wed, Dec 10, 2014 at 9:01 PM, Debasish Das
wrote:
> I added code to compute topK products for each user and topK user for each
> product in SPARK-3066..
>
> That is different than row similarity calculation as we need both user and
I added code to compute topK products for each user and topK user for each
product in SPARK-3066..
That is different than row similarity calculation as we need both user and
product factors to calculate the topK recommendations..
For (1) and (2) we are trying to answer similarUsers to given a use
It's not so cheap to compute row similarities when there are many rows, as
it amounts to computing the outer product of a matrix A (i.e. computing
AA^T, which is expensive).
There is a JIRA to track handling (1) and (2) more efficiently than
computing all pairs: https://issues.apache.org/jira/brow