[ 
https://issues.apache.org/jira/browse/SPARK-14026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-14026:
---------------------------------
    Labels: bulk-closed  (was: )

> Subquery not brodcasted
> -----------------------
>
>                 Key: SPARK-14026
>                 URL: https://issues.apache.org/jira/browse/SPARK-14026
>             Project: Spark
>          Issue Type: Bug
>          Components: Optimizer, SQL
>    Affects Versions: 1.6.0
>            Reporter: Younes
>            Priority: Major
>              Labels: bulk-closed
>
> Subquery doesn't get broadcasted and generate a very large shuffle.
> Select cnt, tab3.*
> from (Select count(1) cnt, col4 from tab1 join tab2 on col1=col2 group by 
> col4)
> join tab3 on (col4=col3);
> This queries resultset is very small, doesn't get broadcasted and creates a 
> huge shuffle: 
> - Select count(1) cnt, col4 from tab1 join tab2 on col1=col2 group by col4
> I tried the same query by persisting the subquery, and it worked just fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to