gt; You shouldn't see any more shuffle if it works.
>
> Yong
>
> --
> Date: Wed, 6 Apr 2016 22:11:38 +0100
> Subject: Re: Plan issue with spark 1.5.2
> From: darshan.m...@gmail.com
> To: java8...@hotmail.com
> CC: user@spark.apache.org
>
&g
the
joined fileds.
You shouldn't see any more shuffle if it works.
Yong
Date: Wed, 6 Apr 2016 22:11:38 +0100
Subject: Re: Plan issue with spark 1.5.2
From: darshan.m...@gmail.com
To: java8...@hotmail.com
CC: user@spark.apache.org
Thanks for the information. When I mention map side join. I
> I am not sure I understand the map side join question you have. If you
> have one DF very small, and the other one is much big, then you want to try
> map join. But you already partitioned both DFs, why you want to map-side
> join then?
>
> Yong
>
> ----------
her one is much big, then you want to try map join.
But you already partitioned both DFs, why you want to map-side join then?
Yong
Date: Wed, 6 Apr 2016 21:03:16 +0100
Subject: Re: Plan issue with spark 1.5.2
From: darshan.m...@gmail.com
To: java8...@hotmail.com
CC: user@spark.apache.org
Thanks
his
> case), but I think spark will sort both DFs again, even you already
> partitioned them.
>
> Yong
>
> ------
> Date: Wed, 6 Apr 2016 20:10:14 +0100
> Subject: Re: Plan issue with spark 1.5.2
> From: darshan.m...@gmail.com
> To: java8...@
level in 1.5.x. If this is
wrong, please let me know.
The execution plan is in fact doing SortMerge (which is correct in this case),
but I think spark will sort both DFs again, even you already partitioned them.
Yong
Date: Wed, 6 Apr 2016 20:10:14 +0100
Subject: Re: Plan issue with spark 1.5.2
I used 1.5.2.I have used movies data to reproduce the issue. Below is the
physical plan. I am not sure why it is hash partitioning the data and then
sort and then join. I expect the data to be joined first and then send for
further processing.
I sort of expect a common partitioner which will work
You need to show us the execution plan, so we can understand what is your issue.
Use the spark shell code to show how your DF is built, how you partition them,
then use explain(true) on your join DF, and show the output here, so we can
better help you.
Yong
> Date: Tue, 5 Apr 2016 09:46:59 -0700