You may need to persist r1 after partitionBy call. second join will be more
efficient.

On Mon, Nov 16, 2015 at 2:48 PM, Rishi Mishra <rmis...@snappydata.io> wrote:

> AFAIK and can see in the code both of them should behave same.
>
> On Sat, Nov 14, 2015 at 2:10 AM, Alexander Pivovarov <apivova...@gmail.com
> > wrote:
>
>> Hi Everyone
>>
>> Is there any difference in performance btw the following two joins?
>>
>>
>> val r1: RDD[(String, String]) = ???
>> val r2: RDD[(String, String]) = ???
>>
>> val partNum = 80
>> val partitioner = new HashPartitioner(partNum)
>>
>> // Join 1
>> val res1 = r1.partitionBy(partitioner).join(r2.partitionBy(partitioner))
>>
>> // Join 2
>> val res2 = r1.join(r2, partNum)
>>
>>
>>
>
>
> --
> Regards,
> Rishitesh Mishra,
> SnappyData . (http://www.snappydata.io/)
>
> https://in.linkedin.com/in/rishiteshmishra
>

Reply via email to