Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21955 )

Change subject: IMPALA-13465: Trace TupleId further to reduce Agg cardinality
......................................................................


Patch Set 12:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/21955/12/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/21955/12/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@286
PS12, Line 286:             "{}:AGGREGATE aggPhase={} aggInputCardinality={} 
aggIdx={} numGroups={} "
> What was the problem with getDisplayLabel? It should produce something simi
I'd like to make it uniform with the rest of 3 TRACE log below. unlike 
getDisplayLabel(), getId() does not do string concatenation and efficient to 
pass into trace log directly without wrapping it inside if 
(LOG.isTraceEnabled()) {


http://gerrit.cloudera.org:8080/#/c/21955/12/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@439
PS12, Line 439:             // TODO: Use producerNode.getCardinality() directly 
if predicate selectivity
> Is there a ticket that would lead to this TODO?
Not yet. I can file one, but it probably will take sometime until we can have 
good accuracy measure of predicate selectivity.


http://gerrit.cloudera.org:8080/#/c/21955/12/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@475
PS12, Line 475:       long ndvBasedNumGroup = 
estimateNumGroups(entry.getSecond(), aggInputCardinality);
> Couldn't we evaluate this when adding to maxCardinality? Which would mean w
Ack. I can do that in next patch set.


http://gerrit.cloudera.org:8080/#/c/21955/12/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q03.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q03.test:

http://gerrit.cloudera.org:8080/#/c/21955/12/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q03.test@43
PS12, Line 43: |  tuple-ids=3 row-size=52B cardinality=9.00M cost=53096454
> This went up a lot. What happened?
This undo recent change from IMPALA-13405 
(https://gerrit.cloudera.org/c/21860/15/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q03.test#43)

Before, numGroup(item.i_brand & item.i_brand_id) = cardinality(02:SCAN HDFS) = 
375.
In this patch, we revert to use inputCardinality(02:SCAN HDFS) = 360.00K to be 
conservative.
Although 02:SCAN HDFS predicates: item.i_manufact_id = CAST(816 AS INT), we 
don't know for sure if that predicate will result in cardinality as low as 375. 
I'd like to avoid propagating cardinality misunderestimation, but like to hear 
more if there is other opinion on this. See my related comment at 
https://gerrit.cloudera.org/c/21955/8/testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test#1328



--
To view, visit http://gerrit.cloudera.org:8080/21955
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I11f59ccc469c24c1800abaad3774c56190306944
Gerrit-Change-Number: 21955
Gerrit-PatchSet: 12
Gerrit-Owner: Riza Suminto <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Yida Wu <[email protected]>
Gerrit-Comment-Date: Tue, 29 Oct 2024 00:15:02 +0000
Gerrit-HasComments: Yes

Reply via email to