[ https://issues.apache.org/jira/browse/HIVE-24376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zoltan Haindrich updated HIVE-24376: ------------------------------------ Parent: HIVE-24384 Issue Type: Sub-task (was: Improvement) > SharedWorkOptimizer may retain the SJ filter condition during RemoveSemijoin > mode > ---------------------------------------------------------------------------------- > > Key: HIVE-24376 > URL: https://issues.apache.org/jira/browse/HIVE-24376 > Project: Hive > Issue Type: Sub-task > Reporter: Zoltan Haindrich > Priority: Major > > the mode name is also a bit confusing..but here is what happens: > {code} > TS[A1] -> ... > TS[A2] -> JOIN > TS[B] -> JOIN > {code} > we have an SJ edge between TS[B] -> TS[A2] to communicate informations about > the join keys; lets assume the reducation ratio was r. > RemoveSemijoin right now does the following: > * removes the semijoin edge (so TS[A2] will become a full scan) > * merges TS[A1] and TS[A2] > w.r.t to read data from disk: this is great - we accessed A twice; from which > 1 was a full scan - and now we only read it once. > but from row traffic perspective: TS[A2] emits more rows from now on because > we dont have the r ratio semijoin reduction anymore. > -- This message was sent by Atlassian Jira (v8.3.4#803005)