[ 
https://issues.apache.org/jira/browse/HIVE-17193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16212502#comment-16212502
 ] 

Rui Li commented on HIVE-17193:
-------------------------------

The main challenge here is how to decide whether two DPP works are different. 
In {{CombineEquivalentWorkResolver}}, we visit child tasks before its parent. 
That means when we visit the target map works, we haven't seen the 
corresponding DPPs yet. The simplest solution is, if the DPP works' IDs 
(tracked by the target map works) are different, then we consider the target 
map works are different and don't combine them. The drawback is we'll lose some 
optimization opportunities - actually I'm not sure whether it's possible that 
two target map works share the same DPP in current implementation.

Another solution is we walk the parent tasks first, and combine equivalent DPP 
works. Two DPP works can be considered equivalent as long as they output same 
records. It shouldn't matter how these records are used to prune different 
tables. As we combine the DPP works, we update the information in the target 
map works accordingly (DPP works have reference to target map works). Then when 
we visit the target map works later, we know whether they should be combined. 
I'm working on a PoC patch to demonstrate the idea.
[~xuefuz], [~csun], [~stakiar], [~kellyzly] do you have any suggestions?

> HoS: don't combine map works that are targets of different DPPs
> ---------------------------------------------------------------
>
>                 Key: HIVE-17193
>                 URL: https://issues.apache.org/jira/browse/HIVE-17193
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Rui Li
>            Assignee: Rui Li
>
> Suppose {{srcpart}} is partitioned by {{ds}}. The following query can trigger 
> the issue:
> {code}
> explain
> select * from
>   (select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.key) 
> a
> join
>   (select srcpart.ds,srcpart.key from srcpart join src on 
> srcpart.ds=src.value) b
> on a.key=b.key;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to