Following are my questions. Thank you.

1. When joining dataframes is it a good idea to repartition on the key
column that is used in the join or
the optimizer is too smart so forget it.

2. In RDD join, wherever possible we do reduceByKey before the join to
avoid a big shuffle of data. Do we need
to do anything similar with dataframe joins, or the optimizer is too
smart so forget it.

Reply via email to