Hi,
For last couple of days I have been trying hard to get around this
problem. Please share any insights on solving this problem.
Problem :
There is a huge list of (key, value) pairs. I want to transform this to
(key, distinct values) and then eventually to (key, distinct values count)
On sma
hould be much more performant at the cost of some accuracy.
>
>
> On Sat, Jun 14, 2014 at 1:58 PM, Vivek YS wrote:
>
>> Hi,
>>For last couple of days I have been trying hard to get around this
>> problem. Please share any insights on solving this problem.
>>
&g
ters. It
>>> can also be a problem if you do not have enough disk space, meaning that
>>> you have to unpersist at the right points if you are running long flows.
>>>
>>> For us, even though the disk writes are a performance hit, we prefer the
>>> Spark
No I am sure the items match. Because userCluster & productCluster are
prepared from "data" . Cross product of userCluster & productCluster is a
super set of "data".
On Thu, May 1, 2014 at 3:41 PM, Mayur Rustagi wrote:
> Mostly none of the items in PairRDD match your input. Hence the error.
>