Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21955 )
Change subject: IMPALA-13465: Trace TupleId further to reduce Agg cardinality ...................................................................... Patch Set 12: (4 comments) http://gerrit.cloudera.org:8080/#/c/21955/12/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/21955/12/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@286 PS12, Line 286: "{}:AGGREGATE aggPhase={} aggInputCardinality={} aggIdx={} numGroups={} " > What was the problem with getDisplayLabel? It should produce something simi I'd like to make it uniform with the rest of 3 TRACE log below. unlike getDisplayLabel(), getId() does not do string concatenation and efficient to pass into trace log directly without wrapping it inside if (LOG.isTraceEnabled()) { http://gerrit.cloudera.org:8080/#/c/21955/12/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@439 PS12, Line 439: // TODO: Use producerNode.getCardinality() directly if predicate selectivity > Is there a ticket that would lead to this TODO? Not yet. I can file one, but it probably will take sometime until we can have good accuracy measure of predicate selectivity. http://gerrit.cloudera.org:8080/#/c/21955/12/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@475 PS12, Line 475: long ndvBasedNumGroup = estimateNumGroups(entry.getSecond(), aggInputCardinality); > Couldn't we evaluate this when adding to maxCardinality? Which would mean w Ack. I can do that in next patch set. http://gerrit.cloudera.org:8080/#/c/21955/12/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q03.test File testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q03.test: http://gerrit.cloudera.org:8080/#/c/21955/12/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q03.test@43 PS12, Line 43: | tuple-ids=3 row-size=52B cardinality=9.00M cost=53096454 > This went up a lot. What happened? This undo recent change from IMPALA-13405 (https://gerrit.cloudera.org/c/21860/15/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q03.test#43) Before, numGroup(item.i_brand & item.i_brand_id) = cardinality(02:SCAN HDFS) = 375. In this patch, we revert to use inputCardinality(02:SCAN HDFS) = 360.00K to be conservative. Although 02:SCAN HDFS predicates: item.i_manufact_id = CAST(816 AS INT), we don't know for sure if that predicate will result in cardinality as low as 375. I'd like to avoid propagating cardinality misunderestimation, but like to hear more if there is other opinion on this. See my related comment at https://gerrit.cloudera.org/c/21955/8/testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test#1328 -- To view, visit http://gerrit.cloudera.org:8080/21955 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I11f59ccc469c24c1800abaad3774c56190306944 Gerrit-Change-Number: 21955 Gerrit-PatchSet: 12 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Yida Wu <[email protected]> Gerrit-Comment-Date: Tue, 29 Oct 2024 00:15:02 +0000 Gerrit-HasComments: Yes
