[
https://issues.apache.org/jira/browse/IMPALA-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17948044#comment-17948044
]
ASF subversion and git services commented on IMPALA-13873:
----------------------------------------------------------
Commit c1aac4b3a4fb616763cedc59648cfde6e8f5ec70 in impala's branch
refs/heads/master from Daniel Becker
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=c1aac4b3a ]
IMPALA-13873: Missing equivalence conjunct in aggregation node with inline views
Some queries involving plain (distinct) UNIONs miss conjuncts, leading
to incorrect results:
Example:
WITH u1 AS (select 10 a, 10 b),
t AS (select a, b, min(b) over (partition by a) min_b from u1 UNION
select 10, 10, 20)
select t.* from t where t.b = t.min_b;
Expected result:
+----+----+-------+
| a | b | min_b |
+----+----+-------+
| 10 | 10 | 10 |
+----+----+-------+
Actual result:
+----+----+-------+
| a | b | min_b |
+----+----+-------+
| 10 | 10 | 10 |
| 10 | 20 | 10 |
+----+----+-------+
This is caused by MultiAggregateInfo assuming that conjuncts bound by
grouping slots that are produced by SlotRef grouping expressions are
already evaluated below the AggregationNode. However, this is not true
in all cases: with UNIONs, there may be conjuncts that are unassigned
below the AggregationNode.
This may happen if a conjunct cannot be pushed into all operands of a
UNION, because the source tuples in the operands do not contain all of
the slots referenced by the predicate. In the example above, it happens
in the first operand:
select a, b, min(b) over (partition by a) min_b from u1
The source tuple, 'u1', contains only two slots ('a' and 'b'), but does
not contain a slot corresponding to 'min(b)' - therefore the predicate
't.b = t.min_b' is not bound by the tuple of 'u1'. In theory, the
predicate could still be evaluated directly after materialising the
tuple with 'min(b)', still inside the UNION operand, but Impala
currently does not work that way.
In these cases, the conjuncts need to be evaluated in the
AggregationNode (possibly in addition to some of the UNION operands).
This change fixes this problem by introducing a method in
MultiAggregateInfo: 'setConjunctsToKeep()', where the caller can pass a
list of conjuncts that will not be eliminated. This is called during the
planning of the UNION if there are unassigned conjuncts remaining.
Testing:
- Added a PlannerTest and an EE test for the case where a conjunct
was previously incorrectly removed from the AggregationNode.
- Existing tests cover the case when conjuncts can be safely removed
from an AggregationNode above a UnionNode because the conjuncts are
pushed into all union operands, see for example
https://github.com/apache/impala/blob/6f2d9a2/testdata/workloads/functional-planner/queries/PlannerTest/union.test#L3914
Change-Id: I67a59cd96d83181ce249fd6ca141906f549a09b3
Reviewed-on: http://gerrit.cloudera.org:8080/22746
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Missing equivalence conjunct in aggregation node with inline views
> ------------------------------------------------------------------
>
> Key: IMPALA-13873
> URL: https://issues.apache.org/jira/browse/IMPALA-13873
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 4.5.0
> Reporter: Peter Rozsa
> Assignee: Daniel Becker
> Priority: Major
>
> The following query yields incorrect results:
> select t.* from
> (select
> a,
> b,
> min(b) over (partition by a) min_b
> from
> (
> select
> 10 a,
> 10 b
> union
> select
> 10 a,
> 20 b
> union
> select
> 10 a,
> 10 b
> ) u1
> union
> select 10, 10, 20) t
> where t.b = t.min_b;
> Result:
> +----+----+-------+
> | a | b | min_b |
> +----+----+-------+
> | 10 | 20 | 10 |
> | 10 | 10 | 20 |
> | 10 | 10 | 10 |
> +----+----+-------+
> Correct result:
> +----+----+-------+
> | a | b | min_b |
> +----+----+-------+
> | 10 | 10 | 10 |
> +----+----+-------+
> The underlying cause is that at the analyze part, the filter conjunct (t.b =
> t.min_b) is removed at
> [https://github.com/apache/impala/blob/356b7e5ddf7868968fb76ca55a8046d0291388fd/fe/src/main/java/org/apache/impala/analysis/Analyzer.java#L2955]
> This conjunct should not be removed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]