Re: Spark SQL and Skewed Joins

Jon Walton Tue, 16 Jun 2015 08:29:38 -0700

On Fri, Jun 12, 2015 at 9:43 PM, Michael Armbrust <[email protected]>
wrote:


> 2. Does 1.3.2 or 1.4 have any enhancements that can help?   I tried to use
>> 1.3.1 but SPARK-6967 prohibits me from doing so.    Now that 1.4 is
>> available, would any of the JOIN enhancements help this situation?
>>
>
> I would try Spark 1.4 after running "SET
> spark.sql.planner.sortMergeJoin=true".  Please report back if this works
> for you.
>


Hi Michael,

This does help.  The joins are faster and fewer executors are lost, but it
seems the same core problem still exists - that a single executor is
handling the majority of the join (the skewed key) and bottlenecking the
job.

One idea I had was to split the dimension table into two halves - a small
half which can be broadcast, (with the skewed keys), and the other large
half which could be sort merge joined, (with even distribution), and then
performing two individual queries against the large fact table and union
the results.    Does this sound like a worthwhile approach?

Thank you,

Jon

Re: Spark SQL and Skewed Joins

Reply via email to