[ 
https://issues.apache.org/jira/browse/SPARK-51203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Wang updated SPARK-51203:
------------------------------
    Description: 
ForceOptimizeSkewedJoin allows optimizing skewed join even if
introduce extra shuffle, but currently it only works for aggregation after 
join, not for aggregations in children of join. Like:
{code:java}
HashAggregate
     |
  Exchange 
     |
HashAggregate     Exchange (skewed side)
     |               |
   Sort            Sort
     \               /
      SortMergeJoin{code}
When we enable ForceOptimizeSkewedJoin, can we introduce extra shuffle for join 
child so as to optimize skewed join? Like:
{code:java}
  HashAggregate
       |
    Exchange 
       |
  HashAggregate     
       |
    Exchange
(froce extra shuffle)       Exchange (skewed side)
       |                       |
     Sort                     Sort
       \                      /
      SortMergeJoin(isSkewJoin = true) {code}

  was:
ForceOptimizeSkewedJoin allows optimizing skewed join even if
introduce extra shuffle, but currently it only works for aggregation after 
join, not for aggregations in children of join. Like:
{code:java}
HashAggregate
     |
  Exchange 
     |
HashAggregate     Exchange (skewed side)
     |               |
   Sort            Sort
     \               /
 SortMergeJoin(isSkewJoin = true) {code}
When we enable ForceOptimizeSkewedJoin, can we introduce extra shuffle for join 
child so as to optimize skewed join? Like:
{code:java}
  HashAggregate
       |
    Exchange 
       |
  HashAggregate     
       |
    Exchange
(froce extra shuffle)       Exchange (skewed side)
       |                       |
     Sort                     Sort
       \                      /
      SortMergeJoin(isSkewJoin = true) {code}


> ForceOptimizeSkewedJoin does not take effect for child aggregations in skewed 
> join
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-51203
>                 URL: https://issues.apache.org/jira/browse/SPARK-51203
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 4.0.0
>            Reporter: Zhen Wang
>            Priority: Major
>              Labels: pull-request-available
>
> ForceOptimizeSkewedJoin allows optimizing skewed join even if
> introduce extra shuffle, but currently it only works for aggregation after 
> join, not for aggregations in children of join. Like:
> {code:java}
> HashAggregate
>      |
>   Exchange 
>      |
> HashAggregate     Exchange (skewed side)
>      |               |
>    Sort            Sort
>      \               /
>       SortMergeJoin{code}
> When we enable ForceOptimizeSkewedJoin, can we introduce extra shuffle for 
> join child so as to optimize skewed join? Like:
> {code:java}
>   HashAggregate
>        |
>     Exchange 
>        |
>   HashAggregate     
>        |
>     Exchange
> (froce extra shuffle)       Exchange (skewed side)
>        |                       |
>      Sort                     Sort
>        \                      /
>       SortMergeJoin(isSkewJoin = true) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to