Thank you for your reply, I have tried to set parameter spark.sql.autoBroadcastJoinThreshold to high enough value, however it does not work, I think broadcast of base table is disabled in spark.
> On Jun 30, 2017, at 6:57 PM, Bryan Jeffrey <bryan.jeff...@gmail.com> wrote: > > Hello. > > If you want to allow broadcast join with larger broadcasts you can set > spark.sql.autoBroadcastJoinThreshold to a higher value. This will cause the > plan to allow join despite 'A' being larger than the default threshold. > > Get Outlook for Android <https://aka.ms/ghei36> > > > From: paleyl > Sent: Wednesday, June 28, 10:42 PM > Subject: about broadcast join of base table in spark sql > To: d...@spark.org, user@spark.apache.org > > > Hi All, > > > Recently I meet a problem in broadcast join: I want to left join table A and > B, A is the smaller one and the left table, so I wrote > > A = A.join(B,A("key1") === B("key2"),"left") > > but I found that A is not broadcast out, as the shuffle size is still very > large. > > I guess this is a designed mechanism in spark, so could anyone please tell me > why it is designed like this? I am just very curious. > > > Best, > > > Paley > > >