This is an automated email from the ASF dual-hosted git repository.
morrySnow pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new b25f1c23d7a [doc](docs) Document ANALYZE hot value option (#3769)
b25f1c23d7a is described below
commit b25f1c23d7a3cb5247e5eefd1790cc10923624c1
Author: yujun <[email protected]>
AuthorDate: Wed May 27 17:41:45 2026 +0800
[doc](docs) Document ANALYZE hot value option (#3769)
Add documentation for WITH HOT VALUE in the current master/dev docs and
clarify that sample analyze keeps collecting hot values while rejecting
the option.
---
.../optimization-technology-principle/statistics.md | 17 +++++++++++++++++
docs/sql-manual/sql-statements/statistics/ANALYZE.md | 12 +++++++++++-
.../optimization-technology-principle/statistics.md | 17 +++++++++++++++++
.../sql-manual/sql-statements/statistics/ANALYZE.md | 11 ++++++++++-
4 files changed, 55 insertions(+), 2 deletions(-)
diff --git
a/docs/query-acceleration/optimization-technology-principle/statistics.md
b/docs/query-acceleration/optimization-technology-principle/statistics.md
index 8a509589eaf..3e7a6deffaf 100644
--- a/docs/query-acceleration/optimization-technology-principle/statistics.md
+++ b/docs/query-acceleration/optimization-technology-principle/statistics.md
@@ -98,6 +98,23 @@ Collect on the `l_orderkey` and `l_linenumber` columns of
the `lineitem` table b
ANALYZE TABLE lineitem (l_orderkey, l_linenumber) WITH SAMPLE ROWS 100000;
```
+#### Hot Value Collection
+
+Hot values record frequently occurring column values in column statistics.
They can help the optimizer estimate predicates on skewed data more accurately,
but collecting them can consume more memory on high-cardinality columns.
+
+For manual full collection, use `WITH HOT VALUE` to collect hot values:
+
+```sql
+ANALYZE TABLE lineitem WITH SYNC WITH HOT VALUE;
+```
+
+Do not use `WITH HOT VALUE` with `WITH SAMPLE`; sample collection always
collects hot values, so specifying this option with `WITH SAMPLE` is rejected.
When `WITH HOT VALUE` is not specified, Doris keeps the existing behavior:
+
+| Collection method | Hot value collection |
+| ----------------- | -------------------- |
+| Full collection | Disabled |
+| Sample collection | Always enabled |
+
### Auto Collection
**Purpose**: Use a background thread to scan periodically and automatically
maintain the freshness of statistics.
diff --git a/docs/sql-manual/sql-statements/statistics/ANALYZE.md
b/docs/sql-manual/sql-statements/statistics/ANALYZE.md
index 47f21cffb8c..f6f1f76da36 100644
--- a/docs/sql-manual/sql-statements/statistics/ANALYZE.md
+++ b/docs/sql-manual/sql-statements/statistics/ANALYZE.md
@@ -14,7 +14,7 @@ This statement is used to collect column statistics.
Statistics of columns can b
```sql
ANALYZE {TABLE <table_name> [ (<column_name> [, ...]) ] | DATABASE
<database_name>}
- [ [ WITH SYNC ] [ WITH SAMPLE {PERCENT | ROWS} <sample_rate> ] ];
+ [ [ WITH SYNC ] [ WITH SAMPLE {PERCENT | ROWS} <sample_rate> ] [ WITH HOT
VALUE ] ];
```
## Required Parameters
@@ -41,6 +41,10 @@ ANALYZE {TABLE <table_name> [ (<column_name> [, ...]) ] |
DATABASE <database_nam
> Specify to use the sampling method for collection. When not specified, full
> collection is the default. <sample_rate> is the sampling parameter. When
> using PERCENT sampling, it specifies the sampling percentage; when using
> ROWS sampling, it specifies the number of sampled rows.
+**4. `WITH HOT VALUE`**
+
+> Specify to collect hot values during manual full collection. Hot values help
the optimizer estimate predicates on skewed data more accurately, but
collecting them can consume more memory on high-cardinality columns. When this
option is not specified, Doris keeps the existing behavior: full collection
does not collect hot values, while sample collection always collects hot
values. `WITH HOT VALUE` is rejected when used together with `WITH SAMPLE`.
+
## Return Value
| Column | Note |
@@ -70,4 +74,10 @@ ANALYZE TABLE lineitem WITH SAMPLE PERCENT 10;
```sql
ANALYZE TABLE lineitem WITH SAMPLE ROWS 100000;
+```
+3. Collect full statistics and collect hot values.
+
+```sql
+ANALYZE TABLE lineitem WITH SYNC WITH HOT VALUE;
+```
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/optimization-technology-principle/statistics.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/optimization-technology-principle/statistics.md
index 962c15e837a..5d4f8a12665 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/optimization-technology-principle/statistics.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-acceleration/optimization-technology-principle/statistics.md
@@ -98,6 +98,23 @@ ANALYZE TABLE lineitem WITH SAMPLE ROWS 100000;
ANALYZE TABLE lineitem (l_orderkey, l_linenumber) WITH SAMPLE ROWS 100000;
```
+#### 高频值收集
+
+高频值记录列统计信息中频繁出现的列值。它可以帮助优化器更准确地估算倾斜数据上的谓词选择率,但在高基数列上收集高频值可能消耗更多内存。
+
+手动全量收集时,可通过 `WITH HOT VALUE` 收集高频值:
+
+```sql
+ANALYZE TABLE lineitem WITH SYNC WITH HOT VALUE;
+```
+
+不要将 `WITH HOT VALUE` 与 `WITH SAMPLE` 一起使用;抽样收集始终收集高频值,因此与 `WITH SAMPLE`
同时指定该选项时会被拒绝。不指定 `WITH HOT VALUE` 时,Doris 保持原有行为:
+
+| 收集方式 | 是否收集高频值 |
+| -------- | ---------------- |
+| 全量收集 | 不收集 |
+| 抽样收集 | 始终收集 |
+
### 自动收集
**目的**:通过后台线程定期扫描,自动维护统计信息的新鲜度。
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-statements/statistics/ANALYZE.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-statements/statistics/ANALYZE.md
index b389d1c35a2..da363a8146c 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-statements/statistics/ANALYZE.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-statements/statistics/ANALYZE.md
@@ -14,7 +14,7 @@
```sql
ANALYZE {TABLE <table_name> [ (<column_name> [, ...]) ] | DATABASE
<database_name>}
- [ [ WITH SYNC ] [ WITH SAMPLE {PERCENT | ROWS} <sample_rate> ] ];
+ [ [ WITH SYNC ] [ WITH SAMPLE {PERCENT | ROWS} <sample_rate> ] [ WITH HOT
VALUE ] ];
```
## 必选参数
@@ -41,6 +41,10 @@ ANALYZE {TABLE <table_name> [ (<column_name> [, ...]) ] |
DATABASE <database_nam
> 指定使用抽样方式收集。当不指定时,默认为全量收集。<sample_rate> 为抽样参数,在 PERCENT 采样时指定抽样百分比,ROWS
> 采样时指定抽样行数。
+**4. `WITH HOT VALUE`**
+
+>
指定在手工全量收集时收集高频值。高频值可帮助优化器更准确地估算倾斜数据上的谓词选择率,但在高基数列上收集高频值可能消耗更多内存。不指定该选项时,Doris
保持原有行为:全量收集不收集高频值,抽样收集始终收集高频值。`WITH HOT VALUE` 与 `WITH SAMPLE` 同时使用时会被拒绝。
+
## 返回值
| 列名 | 说明 |
@@ -72,3 +76,8 @@ ANALYZE TABLE lineitem WITH SAMPLE PERCENT 10;
ANALYZE TABLE lineitem WITH SAMPLE ROWS 100000;
```
+3. 对 lineitem 表进行全量收集,并收集高频值。
+
+```sql
+ANALYZE TABLE lineitem WITH SYNC WITH HOT VALUE;
+```
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]