Re: Plan issue with spark 1.5.2

2016-04-06 Thread Darshan Singh
gt; You shouldn't see any more shuffle if it works. > > Yong > > -- > Date: Wed, 6 Apr 2016 22:11:38 +0100 > Subject: Re: Plan issue with spark 1.5.2 > From: darshan.m...@gmail.com > To: java8...@hotmail.com > CC: user@spark.apache.org > &g

RE: Plan issue with spark 1.5.2

2016-04-06 Thread Yong Zhang
the joined fileds. You shouldn't see any more shuffle if it works. Yong Date: Wed, 6 Apr 2016 22:11:38 +0100 Subject: Re: Plan issue with spark 1.5.2 From: darshan.m...@gmail.com To: java8...@hotmail.com CC: user@spark.apache.org Thanks for the information. When I mention map side join. I

Re: Plan issue with spark 1.5.2

2016-04-06 Thread Darshan Singh
> I am not sure I understand the map side join question you have. If you > have one DF very small, and the other one is much big, then you want to try > map join. But you already partitioned both DFs, why you want to map-side > join then? > > Yong > > ----------

RE: Plan issue with spark 1.5.2

2016-04-06 Thread Yong Zhang
her one is much big, then you want to try map join. But you already partitioned both DFs, why you want to map-side join then? Yong Date: Wed, 6 Apr 2016 21:03:16 +0100 Subject: Re: Plan issue with spark 1.5.2 From: darshan.m...@gmail.com To: java8...@hotmail.com CC: user@spark.apache.org Thanks

Re: Plan issue with spark 1.5.2

2016-04-06 Thread Darshan Singh
his > case), but I think spark will sort both DFs again, even you already > partitioned them. > > Yong > > ------ > Date: Wed, 6 Apr 2016 20:10:14 +0100 > Subject: Re: Plan issue with spark 1.5.2 > From: darshan.m...@gmail.com > To: java8...@

RE: Plan issue with spark 1.5.2

2016-04-06 Thread Yong Zhang
level in 1.5.x. If this is wrong, please let me know. The execution plan is in fact doing SortMerge (which is correct in this case), but I think spark will sort both DFs again, even you already partitioned them. Yong Date: Wed, 6 Apr 2016 20:10:14 +0100 Subject: Re: Plan issue with spark 1.5.2

Re: Plan issue with spark 1.5.2

2016-04-06 Thread Darshan Singh
I used 1.5.2.I have used movies data to reproduce the issue. Below is the physical plan. I am not sure why it is hash partitioning the data and then sort and then join. I expect the data to be joined first and then send for further processing. I sort of expect a common partitioner which will work

RE: Plan issue with spark 1.5.2

2016-04-05 Thread Yong Zhang
You need to show us the execution plan, so we can understand what is your issue. Use the spark shell code to show how your DF is built, how you partition them, then use explain(true) on your join DF, and show the output here, so we can better help you. Yong > Date: Tue, 5 Apr 2016 09:46:59 -0700