Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21279 )

Change subject: IMPALA-12657: Improve ProcessingCost of ScanNode and 
NonGroupingAggregator
......................................................................


Patch Set 7:

(5 comments)

I took the liberty of editing few nits along with the rebases.
I leave my comments here for some that is not quite straightforward and need 
discussion.

http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
File fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java:

http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@72
PS7, Line 72:   private static final double AGG_EXPR_COST_COEFFICIENT = 0.07;
            :   // Cost per input row with single grouping expr with single agg 
expr
            :   private static final double 
AGG_INPUT_SINGLE_GROUP_COST_COEFFICIENT = 0.2925;
            :   // Cost per output row produced with single grouping expr and 
single agg expr
            :   private static final double 
AGG_OUTPUT_SINGLE_GROUP_COST_COEFFICIENT = 2.6072;
            :   // Cost per input row with multiple grouping exprs and single 
agg expr
            :   private static final double 
AGG_INPUT_MULTI_GROUP_COST_COEFFICIENT = 1.3741;
            :   // Cost per output row produced with multiple grouping exprs 
and single agg expr
            :   private static final double 
AGG_OUTPUT_MULTI_GROUP_COST_COEFFICIENT = 4.5285;
nit: Personally, I prefer related constant names to have common prefix, so 
these names becomes:

COST_COEFFICIENT_AGG_EXPR
COST_COEFFICIENT_AGG_INPUT
COST_COEFFICIENT_AGG_OUTPUT
COST_COEFFICIENT_AGG_INPUT_MULTI_GROUP
COST_COEFFICIENT_AGG_OUTPUT_MULTI_GROUP

I think it is easier to distinguish them if they have common prefix. Same 
comment for other coefficient constants.


http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@582
PS7, Line 582: fragment_.getNumInstances()
I'm not sure about using getNumInstances() here. traverseEffectiveParallelism() 
will change parallelism/num_instances later. If traverseEffectiveParallelism() 
increase/decrease num instances, intermediateOutputCardinality will be 
underestimated/overestimated. I understand this is a chicken-and-egg problem.

I wonder if simply using Analyzer.getMinParallelismPerNode() * getNumNodes() is 
enough. That would be the approximate lower bound for 
intermediateOutputCardinality. What is your observation during benchmarking on 
this issue?


http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
File fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java:

http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java@116
PS7, Line 116: fragment_.getNumInstances()
Another chicken-and-egg problem of using getNumInstances() before 
traverseEffectiveParallelism().


http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/planner/PlanFragment.java
File fe/src/main/java/org/apache/impala/planner/PlanFragment.java:

http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@633
PS7, Line 633: getNumInstances
Another chicken-and-egg problem of using getNumInstances() before 
traverseEffectiveParallelism()?


http://gerrit.cloudera.org:8080/#/c/21279/7/testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test:

http://gerrit.cloudera.org:8080/#/c/21279/7/testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test@132
PS7, Line 132:    HDFS partitions=1824/1824 files=1824 size=199.18MB
             :    stored statistics:
             :      table: rows=2.88M size=199.18MB
nit: unnecessary change. Small bytes variability that is not asserted in 
Planner test.



--
To view, visit http://gerrit.cloudera.org:8080/21279
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icf1edd48d4ae255b7b3b7f5b228800d7bac7d2ca
Gerrit-Change-Number: 21279
Gerrit-PatchSet: 7
Gerrit-Owner: David Rorke <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Wenzhe Zhou <[email protected]>
Gerrit-Comment-Date: Thu, 11 Apr 2024 14:14:34 +0000
Gerrit-HasComments: Yes

Reply via email to