Hi, How spark decides/optimizes internally as to when it needs to a BroadcastHashJoin vs SortMergeJoin? Is there anyway we can guide from outside or through options which Join to use? Because in my case when i am trying to do a join, spark makes that join as BroadCastHashJoin internally and when join is actually being executed it waits for broadcast to be done (which is big data), resulting in timeout. I do not want to increase value of timeout i.e. "spark.sql.broadcastTimeout". Rather i want this to be done via SortMergeJoin. How can i enforce that?
Thanks Ravi -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-spark-decides-whether-to-do-BroadcastHashJoin-or-SortMergeJoin-tp27369.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org