[ 
https://issues.apache.org/jira/browse/HIVE-24812?focusedWorklogId=557076&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-557076
 ]

ASF GitHub Bot logged work on HIVE-24812:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 24/Feb/21 15:57
            Start Date: 24/Feb/21 15:57
    Worklog Time Spent: 10m 
      Work Description: kgyrtkirk commented on pull request #2006:
URL: https://github.com/apache/hive/pull/2006#issuecomment-785178016


   @jcamachor alternate approach could be to instead of disabling; restrict the 
optimization to only target case when:
   ```
   TS_SJ -> [...] -> JOIN
   TS_FILTERED -> [...] -> JOIN -> [...]
   TS_X -> [...]
   ```
   `TS_X` and `TS_FILTERED`  scans the same table - those are being optimized; 
`TS_SJ` is a table from which the SJ filter is computed.
   
   * in the above case if there is no `RS -> MAPJOIN ` in the `TS_FILTERED -> 
[...] -> JOIN` path then the optimization might not make much harm...
   * or more generally - if there is no `RS` on that path it may definetly go 
ahead and make the optimization
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 557076)
    Time Spent: 20m  (was: 10m)

> Disable sharedworkoptimizer remove semijoin by default
> ------------------------------------------------------
>
>                 Key: HIVE-24812
>                 URL: https://issues.apache.org/jira/browse/HIVE-24812
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Zoltan Haindrich
>            Assignee: Zoltan Haindrich
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> SJ removal backfired a bit when I was testing stuff - because of the 
> additional opportunities paralleledges may enable ; because it will increased 
> the shuffled memory amount and/or even make MJ broadcast inputs larger
> set hive.optimize.shared.work.semijoin=false by default for now
> right now it's better to leave dppunion to pick up these cases instead of 
> removing the SJ fully - after HIVE-24376 we might enable it back 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to