Daniel Becker has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/22746 )
Change subject: IMPALA-13873: Missing equivalence conjunct in aggregation node with inline views ...................................................................... IMPALA-13873: Missing equivalence conjunct in aggregation node with inline views Some queries involving plain (distinct) UNIONs miss conjuncts, leading to incorrect results: Example: WITH u1 AS (select 10 a, 10 b), t AS (select a, b, min(b) over (partition by a) min_b from u1 UNION select 10, 10, 20) select t.* from t where t.b = t.min_b; Expected result: +----+----+-------+ | a | b | min_b | +----+----+-------+ | 10 | 10 | 10 | +----+----+-------+ Actual result: +----+----+-------+ | a | b | min_b | +----+----+-------+ | 10 | 10 | 10 | | 10 | 20 | 10 | +----+----+-------+ This is caused by MultiAggregateInfo assuming that conjuncts bound by grouping slots that are produced by SlotRef grouping expressions are already evaluated below the AggregationNode. However, this is not true in all cases: with UNIONs, there may be conjuncts that are unassigned below the AggregationNode. This may happen if a conjunct cannot be pushed into all operands of a UNION, because the source tuples in the operands do not contain all of the slots referenced by the predicate. In the example above, it happens in the first operand: select a, b, min(b) over (partition by a) min_b from u1 The source tuple, 'u1', contains only two slots ('a' and 'b'), but does not contain a slot corresponding to 'min(b)' - therefore the predicate 't.b = t.min_b' is not bound by the tuple of 'u1'. In these cases, the conjuncts need to be evaluated in the AggregationNode (possibly in addition to some of the UNION operands). This change fixes this problem by introducing a method in MultiAggregateInfo: 'setConjunctsToKeep()', where the caller can pass a list of conjuncts that will not be eliminated. This is called during the planning of the UNION if there are unassigned conjuncts remaining. Testing: - Added a PlannerTest and an EE test for the case where a conjunct was previously incorrectly removed from the AggregationNode. - Existing tests cover the case when conjuncts can be safely removed from an AggregationNode above a UnionNode because the conjuncts are pushed into all union operands, see for example https://github.com/apache/impala/blob/6f2d9a24d8c014a7dc1ec7a08bcfb025b3bdf41f/testdata/workloads/functional-planner/queries/PlannerTest/union.test#L3914 Change-Id: I67a59cd96d83181ce249fd6ca141906f549a09b3 --- M fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M testdata/workloads/functional-planner/queries/PlannerTest/union.test M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 4 files changed, 88 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/22746/6 -- To view, visit http://gerrit.cloudera.org:8080/22746 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I67a59cd96d83181ce249fd6ca141906f549a09b3 Gerrit-Change-Number: 22746 Gerrit-PatchSet: 6 Gerrit-Owner: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com> Gerrit-Reviewer: Noemi Pap-Takacs <npaptak...@cloudera.com> Gerrit-Reviewer: Peter Rozsa <pro...@cloudera.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>