[ 
https://issues.apache.org/jira/browse/HIVE-26968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680908#comment-17680908
 ] 

Seonggon Namgung commented on HIVE-26968:
-----------------------------------------

The attached graphs in [^TPC-DS Query64 OperatorGraph.pdf] show the problem of 
current SharedWorkOptimizer.
If hive.optimize.shared.work.extended is set to true, current SWO merges RS[59] 
and RS[186] as they have the same subtree except their DPP parents.
After the merge, TS[25] is merged into TS[152], but the DPP edge from 
EVENT[625] to TS[25] are not preserved.
Therefore, TS[152] only emits records which join with date_dim where ss_date_sk 
= d_date_sk and d_year = 2001.
So MAPJOIN[636], which joins with date_dim where d_year = 2000, emits no 
records, and this leads to incorrect query execution result.

The proposed PR compares 2 TS operators using existing DPP parent comparison 
method when SWO compares and gathers parent operators.

> SharedWorkOptimizer merges TableScan operators that have different DPP parents
> ------------------------------------------------------------------------------
>
>                 Key: HIVE-26968
>                 URL: https://issues.apache.org/jira/browse/HIVE-26968
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 4.0.0-alpha-2
>            Reporter: Seonggon Namgung
>            Assignee: Seonggon Namgung
>            Priority: Critical
>              Labels: pull-request-available
>         Attachments: TPC-DS Query64 OperatorGraph.pdf
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> SharedWorkOptimizer merges TableScan operators that have different DPP 
> parents, which leads to the creation of semantically wrong query plan.
> In our environment, running TPC-DS query64 on 1TB Iceberg format table 
> returns no rows  because of this problem. (The correct result has 7094 rows.)
> We use hive.optimize.shared.work=true, 
> hive.optimize.shared.work.extended=true, and 
> hive.optimize.shared.work.dppunion=false to reproduce the bug.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to