You may need to persist r1 after partitionBy call. second join will be more efficient.
On Mon, Nov 16, 2015 at 2:48 PM, Rishi Mishra <rmis...@snappydata.io> wrote: > AFAIK and can see in the code both of them should behave same. > > On Sat, Nov 14, 2015 at 2:10 AM, Alexander Pivovarov <apivova...@gmail.com > > wrote: > >> Hi Everyone >> >> Is there any difference in performance btw the following two joins? >> >> >> val r1: RDD[(String, String]) = ??? >> val r2: RDD[(String, String]) = ??? >> >> val partNum = 80 >> val partitioner = new HashPartitioner(partNum) >> >> // Join 1 >> val res1 = r1.partitionBy(partitioner).join(r2.partitionBy(partitioner)) >> >> // Join 2 >> val res2 = r1.join(r2, partNum) >> >> >> > > > -- > Regards, > Rishitesh Mishra, > SnappyData . (http://www.snappydata.io/) > > https://in.linkedin.com/in/rishiteshmishra >