Hello Aman Sinha, Yida Wu, Michael Smith, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/21955
to look at the new patch set (#25).
Change subject: IMPALA-13465: Trace TupleId further to reduce Agg cardinality
......................................................................
IMPALA-13465: Trace TupleId further to reduce Agg cardinality
IMPALA-13405 does tuple analysis to lower AggregationNode cardinality.
It begins by focusing on the simple column SlotRef, but we can improve
this further by tracing the origin TupleId across views and intermediate
aggregation tuples. This patch implements deeper TupleId tracing to
achieve further cardinality reduction. With this deeper TupleId
resolution, it is possible now to narrow down the TupleId search across
children ScanNodes and UnionNodes only.
Note that this optimization is still limited to run ONLY IF there are at
least two grouping expressions that refer to the same TupleId. There is
a benefit to run the same optimization even though there is only a
single expression per TupleId, but we defer that work until we can
provide faster TupleId to PlanNode mapping without repeating the plan
tree traversal.
This patch also makes tuple-based reduction more conservative by capping
at input cardinality/limit, or using output cardinality if the producer
node is a UnionNode or has hard estimates. aggInputCardinality is still
indirectly influenced by predicates and limits of children's nodes.
The following PlannerTest (under
testdata/workloads/functional-planner/queries/PlannerTest/) revert their
cardinality estimation to their state pior to IMPALA-13405:
tpcds/tpcds-q19.test
tpcds/tpcds-q55.test
tpcds_cpu_cost/tpcds-q03.test
tpcds_cpu_cost/tpcds-q31.test
tpcds_cpu_cost/tpcds-q47.test
tpcds_cpu_cost/tpcds-q52.test
tpcds_cpu_cost/tpcds-q57.test
tpcds_cpu_cost/tpcds-q89.test
Several other planner tests have increased cardinality after this
change, but the numbers are still below pre-IMPALA-13405.
Testing:
- Pass core tests.
Change-Id: I11f59ccc469c24c1800abaad3774c56190306944
---
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M
testdata/workloads/functional-planner/queries/PlannerTest/agg-node-max-mem-estimate.test
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q12.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q19.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q20.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q37.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q55.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q82.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q98.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/ddl.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q03.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q12.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q19.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q20.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q31.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q37.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q43-verbose.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q43.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q47.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q52.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q55.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q57.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q67.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q82.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q89.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q98.test
28 files changed, 1,218 insertions(+), 1,431 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/21955/25
--
To view, visit http://gerrit.cloudera.org:8080/21955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I11f59ccc469c24c1800abaad3774c56190306944
Gerrit-Change-Number: 21955
Gerrit-PatchSet: 25
Gerrit-Owner: Riza Suminto <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Yida Wu <[email protected]>