The join selection can be described in https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala#L92 . If you have join keys, you can set -1 at `spark.sql.autoBroadcastJoinThreshold` to disable broadcast joins. Then, hash joins are used in queries.
// maropu On Tue, Jul 5, 2016 at 4:23 AM, Lalitha MV <lalitham...@gmail.com> wrote: > Hi maropu, > > Thanks for your reply. > > Would it be possible to write a rule for this, to make it always pick > shuffle hash join, over other join implementations(i.e. sort merge and > broadcast)? > > Is there any documentation demonstrating rule based transformation for > physical plan trees? > > Thanks, > Lalitha > > On Sat, Jul 2, 2016 at 12:58 AM, Takeshi Yamamuro <linguin....@gmail.com> > wrote: > >> Hi, >> >> No, spark has no hint for the hash join. >> >> // maropu >> >> On Fri, Jul 1, 2016 at 4:56 PM, Lalitha MV <lalitham...@gmail.com> wrote: >> >>> Hi, >>> >>> In order to force broadcast hash join, we can set >>> the spark.sql.autoBroadcastJoinThreshold config. Is there a way to enforce >>> shuffle hash join in spark sql? >>> >>> >>> Thanks, >>> Lalitha >>> >> >> >> >> -- >> --- >> Takeshi Yamamuro >> > > > > -- > Regards, > Lalitha > -- --- Takeshi Yamamuro