Thank you for your reply, I have tried to add broadcast hint to the base table,
but it just cannot be broadcast out.
> On Jun 30, 2017, at 9:13 PM, Yong Zhang <java8...@hotmail.com> wrote:
>
> Or since you already use the DataFrame API, instead of SQL, you can add the
> broadcast function to force it.
>
> https://spark.apache.org/docs/1.6.2/api/java/org/apache/spark/sql/functions.html#broadcast(org.apache.spark.sql.DataFrame)
>
> <https://spark.apache.org/docs/1.6.2/api/java/org/apache/spark/sql/functions.html#broadcast(org.apache.spark.sql.DataFrame)>
>
> Yong
> functions - Apache Spark
> <https://spark.apache.org/docs/1.6.2/api/java/org/apache/spark/sql/functions.html#broadcast(org.apache.spark.sql.DataFrame)>
> spark.apache.org <http://spark.apache.org/>
> Computes the numeric value of the first character of the string column, and
> returns the result as a int column.
>
>
>
>
> From: Bryan Jeffrey <bryan.jeff...@gmail.com>
> Sent: Friday, June 30, 2017 6:57 AM
> To: d...@spark.org; user@spark.apache.org; paleyl
> Subject: Re: about broadcast join of base table in spark sql
>
> Hello.
>
> If you want to allow broadcast join with larger broadcasts you can set
> spark.sql.autoBroadcastJoinThreshold to a higher value. This will cause the
> plan to allow join despite 'A' being larger than the default threshold.
>
> Get Outlook for Android <https://aka.ms/ghei36>
>
>
> From: paleyl
> Sent: Wednesday, June 28, 10:42 PM
> Subject: about broadcast join of base table in spark sql
> To: d...@spark.org, user@spark.apache.org
>
>
> Hi All,
>
>
> Recently I meet a problem in broadcast join: I want to left join table A and
> B, A is the smaller one and the left table, so I wrote
>
> A = A.join(B,A("key1") === B("key2"),"left")
>
> but I found that A is not broadcast out, as the shuffle size is still very
> large.
>
> I guess this is a designed mechanism in spark, so could anyone please tell me
> why it is designed like this? I am just very curious.
>
>
> Best,
>
>
> Paley