Hello Aman Sinha, Yida Wu, Michael Smith, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/21955

to look at the new patch set (#15).

Change subject: IMPALA-13465: Trace TupleId further to reduce Agg cardinality
......................................................................

IMPALA-13465: Trace TupleId further to reduce Agg cardinality

IMPALA-13405 does tuple analysis to lower AggregationNode cardinality.
It begins by focusing on the simple column SlotRef, but we can improve
this further by tracing the origin TupleId across views and intermediate
aggregation tuples. This patch implements deeper TupleId tracing to
achieve further cardinality reduction. With this deeper TupleId
resolution, it is possible now to narrow down the TupleId search across
children ScanNodes and UnionNodes only.

Note that this optimization is still limited to run ONLY IF there are at
least two grouping expressions that refer to the same TupleId. There is
a benefit to run the same optimization even though there is only a
single expression per TupleId, but we defer that work until we can
provide faster TupleId to PlanNode mapping without repeating the plan
tree traversal.

This patch also fixes a bug where the cardinality estimate of MERGE
phase aggregation is not capped against the output cardinality of
EXCHANGE node. This patch also makes tuple-based reduction more
conservative by capping at input cardinality/limit, or using output
cardinality if the producer node is a UnionNode or has hard estimates.
aggInputCardinality is still indirectly influenced by predicates and
limits of children's nodes.

The following PlannerTest (under
testdata/workloads/functional-planner/queries/PlannerTest/) revert to
its state pior to IMPALA-13405:
tpcds/tpcds-q19.test
tpcds/tpcds-q55.test
tpcds_cpu_cost/tpcds-q03.test
tpcds_cpu_cost/tpcds-q31.test
tpcds_cpu_cost/tpcds-q47.test
tpcds_cpu_cost/tpcds-q52.test
tpcds_cpu_cost/tpcds-q57.test
tpcds_cpu_cost/tpcds-q89.test

Several other planner tests have increased cardinality after this
change, but the numbers are still below pre-IMPALA-13405.

Testing:
- Enable cardinality validation in PlannerTest.testAggregation.
- Move IMPALA-13405 planner to aggregation.test that is a better fit.
- Add new test case in aggregation.test.
- Pass core tests.

Change-Id: I11f59ccc469c24c1800abaad3774c56190306944
---
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/agg-node-max-mem-estimate.test
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q12.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q19.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q20.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q37.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q55.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q82.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q98.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/ddl.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q03.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q12.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q19.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q20.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q31.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q37.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q43-verbose.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q43.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q47.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q52.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q55.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q57.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q67.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q82.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q89.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q98.test
29 files changed, 1,656 insertions(+), 1,409 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/21955/15
--
To view, visit http://gerrit.cloudera.org:8080/21955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I11f59ccc469c24c1800abaad3774c56190306944
Gerrit-Change-Number: 21955
Gerrit-PatchSet: 15
Gerrit-Owner: Riza Suminto <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Yida Wu <[email protected]>

Reply via email to