Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/21279 )
Change subject: IMPALA-12657: Improve ProcessingCost of ScanNode and NonGroupingAggregator ...................................................................... Patch Set 7: (5 comments) I took the liberty of editing few nits along with the rebases. I leave my comments here for some that is not quite straightforward and need discussion. http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java File fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java: http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java@72 PS7, Line 72: private static final double AGG_EXPR_COST_COEFFICIENT = 0.07; : // Cost per input row with single grouping expr with single agg expr : private static final double AGG_INPUT_SINGLE_GROUP_COST_COEFFICIENT = 0.2925; : // Cost per output row produced with single grouping expr and single agg expr : private static final double AGG_OUTPUT_SINGLE_GROUP_COST_COEFFICIENT = 2.6072; : // Cost per input row with multiple grouping exprs and single agg expr : private static final double AGG_INPUT_MULTI_GROUP_COST_COEFFICIENT = 1.3741; : // Cost per output row produced with multiple grouping exprs and single agg expr : private static final double AGG_OUTPUT_MULTI_GROUP_COST_COEFFICIENT = 4.5285; nit: Personally, I prefer related constant names to have common prefix, so these names becomes: COST_COEFFICIENT_AGG_EXPR COST_COEFFICIENT_AGG_INPUT COST_COEFFICIENT_AGG_OUTPUT COST_COEFFICIENT_AGG_INPUT_MULTI_GROUP COST_COEFFICIENT_AGG_OUTPUT_MULTI_GROUP I think it is easier to distinguish them if they have common prefix. Same comment for other coefficient constants. http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/planner/AggregationNode.java File fe/src/main/java/org/apache/impala/planner/AggregationNode.java: http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@582 PS7, Line 582: fragment_.getNumInstances() I'm not sure about using getNumInstances() here. traverseEffectiveParallelism() will change parallelism/num_instances later. If traverseEffectiveParallelism() increase/decrease num instances, intermediateOutputCardinality will be underestimated/overestimated. I understand this is a chicken-and-egg problem. I wonder if simply using Analyzer.getMinParallelismPerNode() * getNumNodes() is enough. That would be the approximate lower bound for intermediateOutputCardinality. What is your observation during benchmarking on this issue? http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java File fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java: http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java@116 PS7, Line 116: fragment_.getNumInstances() Another chicken-and-egg problem of using getNumInstances() before traverseEffectiveParallelism(). http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: http://gerrit.cloudera.org:8080/#/c/21279/7/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@633 PS7, Line 633: getNumInstances Another chicken-and-egg problem of using getNumInstances() before traverseEffectiveParallelism()? http://gerrit.cloudera.org:8080/#/c/21279/7/testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test File testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test: http://gerrit.cloudera.org:8080/#/c/21279/7/testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test@132 PS7, Line 132: HDFS partitions=1824/1824 files=1824 size=199.18MB : stored statistics: : table: rows=2.88M size=199.18MB nit: unnecessary change. Small bytes variability that is not asserted in Planner test. -- To view, visit http://gerrit.cloudera.org:8080/21279 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icf1edd48d4ae255b7b3b7f5b228800d7bac7d2ca Gerrit-Change-Number: 21279 Gerrit-PatchSet: 7 Gerrit-Owner: David Rorke <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Wenzhe Zhou <[email protected]> Gerrit-Comment-Date: Thu, 11 Apr 2024 14:14:34 +0000 Gerrit-HasComments: Yes
