Hello Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/24153
to look at the new patch set (#3).
Change subject: IMPALA-14597: Initial HBO support
......................................................................
IMPALA-14597: Initial HBO support
This adds the initial support for Historical Based Optimizer (HBO).
Execution stats of finished queries are used in planning new queries.
In the first patch, we just track the cardinality, i.e., number of
output rows, of the PlanNodes. Other execution stats, e.g., peak memory
usage, average row size, cpu usage, etc. will be added in the future.
Execution stats are cached as key-value format. Each PlanNode defines a
key string consists of the key information that is used to match
historiical runs. The key in the HBO cache is a hash of this string. In
this patch, we just supports HdfsScanNode cardinality. Its key string
consists of
- scan path, e.g., fully qualified table name.
- conjuncts including partitioned conjuncts and other conjuncts.
- limit
----------------
Canonicalization
To share execution stats across similar queries, the conjuncts are
canonicalized using different levels of strategies. Two strategies are
added in this patch, listed in ascending order of risk:
- EXPR_REWRITE
- IGNORE_PARTITION_CONSTANTS
EXPR_REWRITE rewrites and normalizes the expressions to the same form,
e.g. "a IN (2, 1)" -> "a IN (1, 2)". It guarentees the expressions are
logically equivalent.
IGNORE_PARTITION_CONSTANTS includes everything of the EXPR_REWRITE
strategy but removes constants from equality predicates on partition
columns, including IN predicates. This strategy assumes all partitions
of the same table have similar characteristics. E.g., assuming ds is a
partition column, the strategy canonicalizes ds = '20260331' to
ds = <CONST> where <CONST> is a placeholder. However, for non-equality
partition predicates, e.g., ds > '20260331', it keeps them as-it.
Each PlanNode will have several HBO key strings each corresponds to a
canonicalization stragety. In the future, we can add more strategies,
e.g., to deal with range partition predicates.
------------------
Matching HBO Stats
After canonicalization, similar PlanNode runs have the same HBO key
string thus have the same hash. To track more historical runs, the value
in the HBO key-value cache is a list of different runs, distinguished by
their input stats including number of input rows, catalog version of the
table, total input size, etc. Basically only the number of input rows is
used. The other fields are only used when it's missing (due to numRows
of some partitions are missing), and in that case, only the EXPR_REWRITE
canonicalization strategy is used. See more details in HistoricalStats
class.
This matching mechanism is used when writing and reading historical
stats. Two new flags are added for this:
- hbo_similarity_threshold: Threshold in [0, 1] for comparing scan input
rows. Two runs are considered similar if the relative difference is
within this threshold. Default is 0.1 (10% tolerance).
- hbo_max_runs_per_key: Maximum number of historical runs to retain per
hash key in the HBO cache. When exceeded, the oldest run is evicted.
-----------------
Storing HBO Stats
Storing HBO stats is done asynchronously in the existing unregistration
thread pool, before the query profile is archived. So this adds no
latency to the query execution. A new query option,
store_historical_stats, is added to configure whether to store HBO stats
of a query.
Output cardinalities are extracted from the ExecSummary, and stored
using the hash strings generated from Frontend. There are some cases
that we decide to skip a HdfsScanNode since the ExecSummary might be
incomplete:
- the query failed or is cancelled.
- there are effective runtime filters, i.e., removed some data in the
HdfsScanNode. The output cardinality depends on when the runtime
filters arrive.
- the PlanNode execution is cancelled, e.g., due to the parent node
reaching its limit. The output cardinality changes if using this
PlanNode in other query plan.
In this patch, HBO stats are stored in memory so coordinators can't
share them. More storage supports will be added in IMPALA-14598.
The following flags are added to configure the cache size and
concurrency limit:
- hbo_in_memory_backend_cache_size_bytes
- hbo_in_memory_backend_concurrency_level
-----------------
Reading HBO Stats
HistoricalStats class provides HBO stats for any Frontend code. For
HdfsScanNode cardinality which is updated in computeCardinalities(), a
HBO key string is generated for each canonicalization strategy. They are
used in ascending order of risk. If the first strategy doesn't find any
match, we will try the next stragety. Cardinality from the HBO stats
overwrites the estimated cardinality.
A new query option, use_historical_stats, is added to configure whether
to use HBO stats in query planning.
-------------
Observability
In the query plan, a marker, "(from HBO)" is added for each cardinality
that are retrived from HBO stats, e.g.
01:SCAN HDFS [functional.alltypes b]
partition predicates: b.`year` = 2010
HDFS partitions=12/24 files=12 size=239.77KB
predicates: b.int_col = 0, b.string_col = '0'
runtime filters: RF000 -> b.id
row-size=21B cardinality=365 (from HBO)
The observability change is mainly contributed by Xuebin Su
([email protected]).
-----
Tests
- Adds a new table alltypes_nonpartitioned for testing non-partitioned
(but with year, month columns) scenarios.
- Added e2e tests
Generated-by: Claude Sonnet 4.5
Change-Id: I6ff60a8bd22c13c0ecad1198934cc96249b1015e
---
M be/generated-sources/gen-cpp/CMakeLists.txt
M be/src/common/global-flags.cc
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/CMakeLists.txt
A common/thrift/HBO.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlOptions.java
M fe/src/main/java/org/apache/impala/catalog/FeTable.java
A fe/src/main/java/org/apache/impala/planner/CanonicalizationStrategy.java
A fe/src/main/java/org/apache/impala/planner/ExprCanonicalizer.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
A fe/src/main/java/org/apache/impala/service/CacheBackend.java
A fe/src/main/java/org/apache/impala/service/HistoricalStats.java
A fe/src/main/java/org/apache/impala/service/HistoricalStatsValue.java
A fe/src/main/java/org/apache/impala/service/InMemoryCacheBackend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M testdata/bin/compute-table-stats.sh
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A tests/query_test/test_hbo.py
37 files changed, 1,498 insertions(+), 14 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/24153/3
--
To view, visit http://gerrit.cloudera.org:8080/24153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6ff60a8bd22c13c0ecad1198934cc96249b1015e
Gerrit-Change-Number: 24153
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>