Hello Kurt Deschler, Riza Suminto, Abhishek Rawat, Wenzhe Zhou, Impala Public
Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/21279
to look at the new patch set (#16).
Change subject: IMPALA-12657: Improve ProcessingCost of ScanNode and
NonGroupingAggregator
......................................................................
IMPALA-12657: Improve ProcessingCost of ScanNode and NonGroupingAggregator
This patch improves the accuracy of the CPU ProcessingCost estimates for
several of the CPU intensive operators by basing the costs on benchmark
data. The general approach for a given operator was to run a set of queries
that exercised the operator under various conditions (e.g. large vs small
row sizes and row counts, varying NDV, different file formats, etc) and
capture the CPU time spent per unit of work (the unit of work might be
measured as some number of rows, some number of bytes, some number of
predicates evaluated, or some combination of these). The data was then
analyzed in an attempt to fit a simple model that would allow us to
predict CPU consumption of a given operator based on information available
at planning time.
For example, the CPU ProcessingCost for a Parquet scan is estimated as:
TotalCost = (0.0144 * BytesMaterialized) + (0.0281 * Rows * Predicate Count)
The coefficients (0.0144 and 0.0281) are derived from benchmarking
scans under a variety of conditions. Similar cost functions and coefficients
were derived for all of the benchmarked operators. The coefficients for all
the operators are normalized such that a single unit of cost equates to
roughly 100 nanoseconds of CPU time on a r5d.4xlarge instance. So we would
predict an operator with a cost of 10,000,000 would complete in roughly one
second on a single core.
Limitations:
* Costing only addresses CPU time spent and doesn't account for any IO
or other wait time.
* Benchmarking scenarios didn't provide comprehensive coverage of the
full range of data types, distributions, etc. More thorough
benchmarking could improve the costing estimates further.
* This initial patch only covers a subset of the operators, focusing
on those that are most common and most CPU intensive. Specifically
the following operators are covered by this patch. All others
continue to use the previous ProcessingCost code:
AggregationNode
DataStreamSink (exchange sender)
ExchangeNode
HashJoinNode
HdfsScanNode
HdfsTableSink
NestedLoopJoinNode
SortNode
UnionNode
Benchmark-based costing of the remaining operators will be covered by
a future patch.
Future patches will automate the collection and analysis of the benchmark
data and the computation of the cost coefficients to simplify maintenance
of the costing as performance changes over time.
Change-Id: Icf1edd48d4ae255b7b3b7f5b228800d7bac7d2ca
---
M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/BaseProcessingCost.java
M fe/src/main/java/org/apache/impala/planner/CostingSegment.java
M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/EmptySetNode.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/ProcessingCost.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M
testdata/workloads/functional-planner/queries/PlannerTest/processing-cost-plan-admission-slots.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q01.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q02.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q03.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q05.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q06.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q07.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q08.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q09.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q10a.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q11.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q12.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q13.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q14a.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q14b.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q15.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q16.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q17.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q18.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q19.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q20.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q21.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q22.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q23a.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q23b.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q24a.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q24b.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q25.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q26.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q27.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q28.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q29.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q30.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q31.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q32.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q33.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q34.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q35a.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q36.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q37.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q38.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q39a.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q39b.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q40.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q41.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q42.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q43.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q44.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q45.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q46.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q47.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q48.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q49.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q50.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q51.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q52.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q53.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q54.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q55.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q56.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q57.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q58.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q59.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q60.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q61.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q62.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q63.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q64.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q65.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q66.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q67.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q68.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q69.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q70.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q71.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q72.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q73.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q74.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q75.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q76.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q77.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q78.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q79.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q80.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q81.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q82.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q83.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q84.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q85.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q86.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q87.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q88.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q89.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q90.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q91.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q92.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q93.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q94.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q95.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q96.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q97.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q98.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q99.test
M tests/custom_cluster/test_executor_groups.py
M tests/query_test/test_insert.py
127 files changed, 18,603 insertions(+), 18,212 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/21279/16
--
To view, visit http://gerrit.cloudera.org:8080/21279
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icf1edd48d4ae255b7b3b7f5b228800d7bac7d2ca
Gerrit-Change-Number: 21279
Gerrit-PatchSet: 16
Gerrit-Owner: David Rorke <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: David Rorke <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Wenzhe Zhou <[email protected]>