[ https://issues.apache.org/jira/browse/HIVE-26968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680908#comment-17680908 ]
Seonggon Namgung commented on HIVE-26968: ----------------------------------------- The attached graphs in [^TPC-DS Query64 OperatorGraph.pdf] show the problem of current SharedWorkOptimizer. If hive.optimize.shared.work.extended is set to true, current SWO merges RS[59] and RS[186] as they have the same subtree except their DPP parents. After the merge, TS[25] is merged into TS[152], but the DPP edge from EVENT[625] to TS[25] are not preserved. Therefore, TS[152] only emits records which join with date_dim where ss_date_sk = d_date_sk and d_year = 2001. So MAPJOIN[636], which joins with date_dim where d_year = 2000, emits no records, and this leads to incorrect query execution result. The proposed PR compares 2 TS operators using existing DPP parent comparison method when SWO compares and gathers parent operators. > SharedWorkOptimizer merges TableScan operators that have different DPP parents > ------------------------------------------------------------------------------ > > Key: HIVE-26968 > URL: https://issues.apache.org/jira/browse/HIVE-26968 > Project: Hive > Issue Type: Bug > Affects Versions: 4.0.0-alpha-2 > Reporter: Seonggon Namgung > Assignee: Seonggon Namgung > Priority: Critical > Labels: pull-request-available > Attachments: TPC-DS Query64 OperatorGraph.pdf > > Time Spent: 20m > Remaining Estimate: 0h > > SharedWorkOptimizer merges TableScan operators that have different DPP > parents, which leads to the creation of semantically wrong query plan. > In our environment, running TPC-DS query64 on 1TB Iceberg format table > returns no rows because of this problem. (The correct result has 7094 rows.) > We use hive.optimize.shared.work=true, > hive.optimize.shared.work.extended=true, and > hive.optimize.shared.work.dppunion=false to reproduce the bug. -- This message was sent by Atlassian Jira (v8.20.10#820010)