Re: Spark not doing a broadcast join inspite of the table being well below spark.sql.autoBroadcastJoinThreshold

V0lleyBallJunki3 Fri, 10 May 2019 07:49:59 -0700

So what I discovered was that if I write the table being joined to the disk
and then read it again Spark correctly broadcasts it. I think it is because
when Spark estimates the size of smaller table it estimates it incorrectly
to be much bigger that what it is and hence decides to do a SortMergeJoin on
it. Writing it to the disk and then reading it back again gives Spark the
correct size and hence it then goes ahead and does a Broadcast join.




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]

Re: Spark not doing a broadcast join inspite of the table being well below spark.sql.autoBroadcastJoinThreshold

Reply via email to