[
https://issues.apache.org/jira/browse/IMPALA-13480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17910470#comment-17910470
]
ASF subversion and git services commented on IMPALA-13480:
----------------------------------------------------------
Commit ce6be49c082fc32ea972b8dc38a7b08e447e39bf in impala's branch
refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=ce6be49c0 ]
IMPALA-13465: Trace TupleId further to reduce Agg cardinality
IMPALA-13405 does tuple analysis to lower AggregationNode cardinality.
It begins by focusing on the simple column SlotRef, but we can improve
this further by tracing the origin TupleId across views and intermediate
aggregation tuples. This patch implements deeper TupleId tracing to
achieve further cardinality reduction. With this deeper TupleId
resolution, it is possible now to narrow down the TupleId search across
children ScanNodes and UnionNodes only.
Note that this optimization is still limited to run ONLY IF there are at
least two grouping expressions that refer to the same TupleId. There is
a benefit to run the same optimization even though there is only a
single expression per TupleId, but we defer that work until we can
provide faster TupleId to PlanNode mapping without repeating the plan
tree traversal.
This patch also makes tuple-based reduction more conservative by capping
at input cardinality/limit, or using output cardinality if the producer
node is a UnionNode or has hard estimates. aggInputCardinality is still
indirectly influenced by predicates and limits of children's nodes.
The following PlannerTest (under
testdata/workloads/functional-planner/queries/PlannerTest/) revert their
cardinality estimation to their state pior to IMPALA-13405:
tpcds/tpcds-q19.test
tpcds/tpcds-q55.test
tpcds_cpu_cost/tpcds-q03.test
tpcds_cpu_cost/tpcds-q31.test
tpcds_cpu_cost/tpcds-q47.test
tpcds_cpu_cost/tpcds-q52.test
tpcds_cpu_cost/tpcds-q57.test
tpcds_cpu_cost/tpcds-q89.test
Several other planner tests have increased cardinality after this
change, but the numbers are still below pre-IMPALA-13405.
Removed nested-view planner test in agg-node-max-mem-estimate.test that
first added by IMPALA-13405. That same test has been duplicated by
IMPALA-13480 at aggregation.test.
Testing:
- Pass core tests.
Change-Id: I11f59ccc469c24c1800abaad3774c56190306944
Reviewed-on: http://gerrit.cloudera.org:8080/21955
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> PlannerTest.testAggregation should VALIDATE_CARDINALITY
> -------------------------------------------------------
>
> Key: IMPALA-13480
> URL: https://issues.apache.org/jira/browse/IMPALA-13480
> Project: IMPALA
> Issue Type: Bug
> Components: Test
> Affects Versions: Impala 4.4.0
> Reporter: Riza Suminto
> Assignee: Riza Suminto
> Priority: Major
> Fix For: Impala 4.5.0
>
>
> PlannerTest.testAggregation does not VALIDATE_CARDINALITY today. Validating
> cardinality will allow us to track our estimation quality and capture
> behavior change like
> https://github.com/apache/impala/blob/c83e5d97693fd3035b33622512d1584a5e56ce8b/fe/src/main/java/org/apache/impala/planner/AggregationNode.java#L74-L76
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]