Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21955 )
Change subject: IMPALA-13465: Trace TupleId further to reduce Agg cardinality ...................................................................... IMPALA-13465: Trace TupleId further to reduce Agg cardinality IMPALA-13405 does tuple analysis to lower AggregationNode cardinality. It begins by focusing on the simple column SlotRef, but we can improve this further by tracing the origin TupleId across views and intermediate aggregation tuples. This patch implements deeper TupleId tracing to achieve further cardinality reduction. With this deeper TupleId resolution, it is possible now to narrow down the TupleId search across children ScanNodes and UnionNodes only. Note that this optimization is still limited to run ONLY IF there are at least two grouping expressions that refer to the same TupleId. There is a benefit to run the same optimization even though there is only a single expression per TupleId, but we defer that work until we can provide faster TupleId to PlanNode mapping without repeating the plan tree traversal. This patch also makes tuple-based reduction more conservative by capping at input cardinality/limit, or using output cardinality if the producer node is a UnionNode or has hard estimates. aggInputCardinality is still indirectly influenced by predicates and limits of children's nodes. The following PlannerTest (under testdata/workloads/functional-planner/queries/PlannerTest/) revert their cardinality estimation to their state pior to IMPALA-13405: tpcds/tpcds-q19.test tpcds/tpcds-q55.test tpcds_cpu_cost/tpcds-q03.test tpcds_cpu_cost/tpcds-q31.test tpcds_cpu_cost/tpcds-q47.test tpcds_cpu_cost/tpcds-q52.test tpcds_cpu_cost/tpcds-q57.test tpcds_cpu_cost/tpcds-q89.test Several other planner tests have increased cardinality after this change, but the numbers are still below pre-IMPALA-13405. Removed nested-view planner test in agg-node-max-mem-estimate.test that first added by IMPALA-13405. That same test has been duplicated by IMPALA-13480 at aggregation.test. Testing: - Pass core tests. Change-Id: I11f59ccc469c24c1800abaad3774c56190306944 Reviewed-on: http://gerrit.cloudera.org:8080/21955 Reviewed-by: Impala Public Jenkins <[email protected]> Tested-by: Impala Public Jenkins <[email protected]> --- M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M testdata/workloads/functional-planner/queries/PlannerTest/agg-node-max-mem-estimate.test M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q12.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q19.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q20.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q37.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q55.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q82.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q98.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/ddl.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q03.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q12.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q19.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q20.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q31.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q37.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q43-verbose.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q43.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q47.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q52.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q55.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q57.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q67.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q82.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q89.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q98.test 28 files changed, 1,212 insertions(+), 1,450 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/21955 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I11f59ccc469c24c1800abaad3774c56190306944 Gerrit-Change-Number: 21955 Gerrit-PatchSet: 29 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Yida Wu <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
