Daniel Becker has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/22746 )

Change subject: IMPALA-13873: Missing equivalence conjunct in aggregation node 
with inline views
......................................................................

IMPALA-13873: Missing equivalence conjunct in aggregation node with inline views

Some queries involving plain (distinct) UNIONs miss conjuncts, leading
to incorrect results:

Example:
  WITH u1 AS (select 10 a, 10 b),
  t AS (select a, b, min(b) over (partition by a) min_b from u1 UNION
  select 10, 10, 20)
  select t.* from t where t.b = t.min_b;

Expected result:
  +----+----+-------+
  | a  | b  | min_b |
  +----+----+-------+
  | 10 | 10 | 10    |
  +----+----+-------+

Actual result:
  +----+----+-------+
  | a  | b  | min_b |
  +----+----+-------+
  | 10 | 10 | 10    |
  | 10 | 20 | 10    |
  +----+----+-------+

This is caused by MultiAggregateInfo assuming that conjuncts bound by
grouping slots that are produced by SlotRef grouping expressions are
already evaluated below the AggregationNode. However, this is not true
in all cases: with UNIONs, there may be conjuncts that are unassigned
below the AggregationNode.

This may happen if a conjunct cannot be pushed into all operands of a
UNION, because the source tuples in the operands do not contain all of
the slots referenced by the predicate. In the example above, it happens
in the first operand:
  select a, b, min(b) over (partition by a) min_b from u1
The source tuple, 'u1', contains only two slots ('a' and 'b'), but does
not contain a slot corresponding to 'min(b)' - therefore the predicate
't.b = t.min_b' is not bound by the tuple of 'u1'.

In these cases, the conjuncts need to be evaluated in the
AggregationNode (possibly in addition to some of the UNION operands).

This change fixes this problem by introducing a method in
MultiAggregateInfo: 'setConjunctsToKeep()', where the caller can pass a
list of conjuncts that will not be eliminated. This is called during the
planning of the UNION if there are unassigned conjuncts remaining.

Testing:
 - Added a PlannerTest and an EE test for the case where a conjunct
   was previously incorrectly removed from the AggregationNode.
 - Existing tests cover the case when conjuncts can be safely removed
   from an AggregationNode above a UnionNode because the conjuncts are
   pushed into all union operands, see for example
   
https://github.com/apache/impala/blob/6f2d9a24d8c014a7dc1ec7a08bcfb025b3bdf41f/testdata/workloads/functional-planner/queries/PlannerTest/union.test#L3914

Change-Id: I67a59cd96d83181ce249fd6ca141906f549a09b3
---
M fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 88 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/22746/6
--
To view, visit http://gerrit.cloudera.org:8080/22746
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I67a59cd96d83181ce249fd6ca141906f549a09b3
Gerrit-Change-Number: 22746
Gerrit-PatchSet: 6
Gerrit-Owner: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <npaptak...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pro...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>

Reply via email to