Hi,

How spark decides/optimizes internally as to when it needs to a
BroadcastHashJoin vs SortMergeJoin? Is there anyway we can guide from
outside or through options which Join to use?
Because in my case when i am trying to do a join, spark makes that join as
BroadCastHashJoin internally and when join is actually being executed it
waits for broadcast to be done (which is big data), resulting in timeout.
I do not want to increase value of timeout i.e. 
"spark.sql.broadcastTimeout". Rather i want this to be done via
SortMergeJoin. How can i enforce that?

Thanks
Ravi



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-spark-decides-whether-to-do-BroadcastHashJoin-or-SortMergeJoin-tp27369.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to