[ 
https://issues.apache.org/jira/browse/HIVE-24376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24376:
------------------------------------
        Parent: HIVE-24384
    Issue Type: Sub-task  (was: Improvement)

> SharedWorkOptimizer may retain the SJ filter condition during RemoveSemijoin  
> mode
> ----------------------------------------------------------------------------------
>
>                 Key: HIVE-24376
>                 URL: https://issues.apache.org/jira/browse/HIVE-24376
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Zoltan Haindrich
>            Priority: Major
>
> the mode name is also a bit confusing..but here is what happens:
> {code}
> TS[A1] -> ...
> TS[A2] -> JOIN
> TS[B] -> JOIN
> {code}
> we have an SJ edge between TS[B] -> TS[A2] to communicate informations about 
> the join keys; lets assume the reducation ratio was r.
> RemoveSemijoin right now does the following:
> * removes the semijoin edge (so TS[A2] will become a full scan)
> * merges TS[A1] and TS[A2]
> w.r.t to read data from disk: this is great - we accessed A twice; from which 
> 1 was a full scan - and now we only read it once.
> but from row traffic perspective: TS[A2] emits more rows from now on because 
> we dont have the r ratio semijoin reduction anymore.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to