[
https://issues.apache.org/jira/browse/IMPALA-13526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17905642#comment-17905642
]
ASF subversion and git services commented on IMPALA-13526:
----------------------------------------------------------
Commit 2828e473710cc246dbdeb9e1da772c45881cfddb in impala's branch
refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=2828e4737 ]
IMPALA-13480: VALIDATE_CARDINALITY in some aggregation tests
This patch enable VALIDATE_CARDINALITY test options in several planner
tests that touch aggregation node. Enabling it has revealed three bugs.
First, in IMPALA-13405, cardinality estimate of MERGE phase aggregation
is not capped against the output cardinality of the EXCHANGE node. This
patch fix it by adding such capping.
Second, tuple-based optimization IMPALA-13405 can cause cardinality
underestimation if HAVING predicate exist. This is due to the default
selectivity of 10% applied for each HAVING predicate. This patch skip
tuple-based optimization if AggregationNode.conjuncts_ is ever not
empty. It will stay skipped on stats recompute, even if conjuncts_ is
transfered into the next Merge AggregationNode above the plan. The
optimization skip causes following PlannerTest (under
testdata/workloads/functional-planner/queries/PlannerTest/) to revert
their cardinality estimation to their state pior to IMPALA-13405:
- tpcds/tpcds-q39a.test
- tpcds/tpcds-q39b.test
- tpcds_cpu_cost/tpcds-q39a.test
- tpcds_cpu_cost/tpcds-q39b.test
In the future, we should consider raising the default selectivity for
HAVING predicate and undo this skipping logic (IMPALA-13542).
Third, is missing stats recompute after conjunct transfer in multi-phase
aggregation. This will be fixed separately by IMPALA-13526.
Testing:
- Enable cardinality validation in testMultipleDistinct*
- Update aggregation.test to reflect current PlannerTest output.
Added some test cases in aggregation.test.
- Run and pass TpcdsPlannerTest and TpcdsCpuPlannerTest.
- Selectively run some more planner tests that touch AggregationNode and
pass them.
Change-Id: Iadb4af9fd65fdb85b66fae1e403ccec8ca5eb102
Reviewed-on: http://gerrit.cloudera.org:8080/22184
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Inconsistent Agg node stats recomputation.
> ------------------------------------------
>
> Key: IMPALA-13526
> URL: https://issues.apache.org/jira/browse/IMPALA-13526
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 4.4.0
> Reporter: Riza Suminto
> Assignee: Riza Suminto
> Priority: Major
>
> Within DistributedPlanner.java, there are several place where Planner need to
> insert extra merge aggregation node. It require transferring HAVING conjuncts
> from preaggregation node to merge aggregation, unsetting limit, and recompute
> stats of preaggregation node. However, the stats recompute is not
> consistently done, and there might be an inefficient recompute happening.
> Example of inefficient recomputes:
> https://github.com/apache/impala/blob/88e0e4e8baa97f7fded12230b14232dc85cf6d79/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java#L1074-L1077
> Example of missing recompute for phase2AggNode:
> https://github.com/apache/impala/blob/88e0e4e8baa97f7fded12230b14232dc85cf6d79/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java#L1143-L1168
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]