Thank you for your reply, but when I remove the left join option(like A = A.join(B,A("key1") === B("key2"))), it can be broadcast out. there is no reason spark cannot get table size when left join option is chosen on.
On Sun, Jul 2, 2017 at 1:55 PM, Xiaoye Sun <sunxiaoy...@gmail.com> wrote: > you may need to check if spark can get the size of your table. If spark > cannot get the table size, it won't do broadcast. > > On Sat, Jul 1, 2017 at 11:37 PM Paley Louie <paley2...@gmail.com> wrote: > >> Thank you for your reply, I have tried to add broadcast hint to the base >> table, but it just cannot be broadcast out. >> >> On Jun 30, 2017, at 9:13 PM, Yong Zhang <java8...@hotmail.com> wrote: >> >> Or since you already use the DataFrame API, instead of SQL, you can add >> the broadcast function to force it. >> >> https://spark.apache.org/docs/1.6.2/api/java/org/apache/ >> spark/sql/functions.html#broadcast(org.apache.spark.sql.DataFrame) >> >> Yong >> functions - Apache Spark >> <https://spark.apache.org/docs/1.6.2/api/java/org/apache/spark/sql/functions.html#broadcast(org.apache.spark.sql.DataFrame)> >> spark.apache.org >> Computes the numeric value of the first character of the string column, >> and returns the result as a int column. >> >> >> >> >> ------------------------------ >> *From:* Bryan Jeffrey <bryan.jeff...@gmail.com> >> *Sent:* Friday, June 30, 2017 6:57 AM >> *To:* d...@spark.org; user@spark.apache.org; paleyl >> *Subject:* Re: about broadcast join of base table in spark sql >> >> Hello. >> >> If you want to allow broadcast join with larger broadcasts you can set >> spark.sql.autoBroadcastJoinThreshold to a higher value. This will cause >> the plan to allow join despite 'A' being larger than the default threshold. >> >> >> Get Outlook for Android <https://aka.ms/ghei36> >> >> >> >> From: paleyl >> Sent: Wednesday, June 28, 10:42 PM >> Subject: about broadcast join of base table in spark sql >> To: d...@spark.org, user@spark.apache.org >> >> >> Hi All, >> >> >> Recently I meet a problem in broadcast join: I want to left join table A >> and B, A is the smaller one and the left table, so I wrote >> >> A = A.join(B,A("key1") === B("key2"),"left") >> >> but I found that A is not broadcast out, as the shuffle size is still >> very large. >> >> I guess this is a designed mechanism in spark, so could anyone please >> tell me why it is designed like this? I am just very curious. >> >> >> Best, >> >> >> Paley >> >> >> -- Peili Lv Department of Pattern Recognition and Intelligent System School of Automation Northwestern Polytechnical University NPU Chang'an Campus Building of Automation #130 http://paley.mydiscussion.net/