(doris-website) branch master updated: [docs] Improve POC guide structure and query pitfalls (#3469)

dataroaring Mon, 16 Mar 2026 06:14:24 -0700

This is an automated email from the ASF dual-hosted git repository.

dataroaring pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git



The following commit(s) were added to refs/heads/master by this push:
     new 5a0acca4710 [docs] Improve POC guide structure and query pitfalls 
(#3469)
5a0acca4710 is described below

commit 5a0acca47109a19b92e5a607dbbca6ee6824f7b5
Author: Yongqiang YANG <[email protected]>
AuthorDate: Mon Mar 16 06:13:37 2026 -0700

    [docs] Improve POC guide structure and query pitfalls (#3469)
    
    ## Summary
    - Nest "Example Templates" under "Table Design" since they are CREATE
    TABLE examples illustrating the design decisions above
    - Rewrite query pitfalls in action-oriented style to match the load
    section ("Avoid...", "Don't...")
    - Add over-bucketing pitfall (up to 50% query perf degradation)
    - Add under-bucketing pitfall (limited CPU parallelism)
    - Expand sort key pitfall: explain 36-byte prefix truncation behavior
    differs from PostgreSQL, recommend inverted indexes
    - Update Chinese translation with all the same changes
    
    ## Test plan
    - [ ] Verify EN page renders correctly at
    `/docs/dev/gettingStarted/before-you-start-the-poc`
    - [ ] Verify zh-CN page renders correctly
    - [ ] Check all anchor links (`#bucketing`, `#sort-key`) resolve
    correctly
    
    🤖 Generated with [Claude Code](https://claude.com/claude-code)
    
    ---------
    
    Co-authored-by: Claude Opus 4.6 <[email protected]>
---
 docs/gettingStarted/before-you-start-the-poc.md          | 16 +++++++++-------
 .../current/gettingStarted/before-you-start-the-poc.md   | 16 +++++++++-------
 2 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/docs/gettingStarted/before-you-start-the-poc.md 
b/docs/gettingStarted/before-you-start-the-poc.md
index 92aae4f141c..c507a505517 100644
--- a/docs/gettingStarted/before-you-start-the-poc.md
+++ b/docs/gettingStarted/before-you-start-the-poc.md
@@ -42,11 +42,11 @@ Default is **Random bucketing** (recommended for Duplicate 
Key tables). Use `DIS
 1. **Multiple of BE count** to ensure even data distribution. When BEs are 
added later, queries typically scan multiple partitions, so performance holds 
up.
 2. **As low as possible** to avoid small files.
 3. **Compressed data per bucket ≤ 20 GB** (≤ 10 GB for Unique Key). Check with 
`SHOW TABLETS FROM your_table`.
-4. **No more than 128 per partition.** Consider partitioning first if you need 
more.
+4. **No more than 128 per partition.** Consider partitioning first if you need 
more. In extreme cases the upper bound is 1024, but this is rarely needed in 
production.
 
-## Example Templates
+### Example Templates
 
-### Log / Event Analytics
+#### Log / Event Analytics
 
 ```sql
 CREATE TABLE app_logs
@@ -63,7 +63,7 @@ AUTO PARTITION BY RANGE(date_trunc(`log_time`, 'day'))
 DISTRIBUTED BY RANDOM BUCKETS 10;
 ```
 
-### Real-Time Dashboard with Upsert (CDC)
+#### Real-Time Dashboard with Upsert (CDC)
 
 ```sql
 CREATE TABLE user_profiles
@@ -78,7 +78,7 @@ UNIQUE KEY(user_id)
 DISTRIBUTED BY HASH(user_id) BUCKETS 10;
 ```
 
-### Metrics Aggregation
+#### Metrics Aggregation
 
 ```sql
 CREATE TABLE site_metrics
@@ -107,7 +107,9 @@ See [Load Best 
Practices](../data-operate/import/load-best-practices).
 
 ### Query
 
-- **Data skew.** Check tablet sizes with `SHOW TABLETS`. Switch to Random 
bucketing or a higher-cardinality bucket column if sizes vary significantly.
-- **Wrong sort key order.** See [Sort Key](#sort-key).
+- **Avoid data skew.** Check tablet sizes with `SHOW TABLETS`. Switch to 
Random bucketing or a higher-cardinality bucket column if sizes vary 
significantly.
+- **Don't over-bucket.** Too many small tablets create scheduling overhead and 
can degrade query performance by up to 50%. See [Bucketing](#bucketing) for 
sizing guidelines.
+- **Don't under-bucket.** Too few tablets limit CPU parallelism. See 
[Bucketing](#bucketing) for sizing guidelines.
+- **Put the right columns in the sort key.** Unlike systems such as 
PostgreSQL, Doris only indexes the first 36 bytes of key columns and stops at 
the first VARCHAR. Columns beyond this prefix won't benefit from the sort key. 
Add [inverted indexes](../table-design/index/inverted-index/overview) for those 
columns. See [Sort Key](#sort-key).
 
 See [Query Profile](../query-acceleration/query-profile) to diagnose slow 
queries.
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/gettingStarted/before-you-start-the-poc.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/gettingStarted/before-you-start-the-poc.md
index 641cce1aa9f..11b5c960f97 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/gettingStarted/before-you-start-the-poc.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/gettingStarted/before-you-start-the-poc.md
@@ -42,11 +42,11 @@
 1. **设为 BE 数量的整数倍**，确保数据均匀分布。后续扩容 BE 时，查询通常涉及多个分区，性能不会受影响。
 2. **尽可能少**，避免小文件。
 3. **每个分桶的压缩后数据 ≤ 20 GB**（Unique Key 表 ≤ 10 GB）。可通过 `SHOW TABLETS FROM 
your_table` 查看。
-4. **每个分区不超过 128 个分桶。**需要更多时优先考虑分区。
+4. **每个分区不超过 128 个分桶。**需要更多时优先考虑分区。极端情况下上限为 1024，但生产环境中很少需要。
 
-## 建表模板
+### 建表模板
 
-### 日志 / 事件分析
+#### 日志 / 事件分析
 
 ```sql
 CREATE TABLE app_logs
@@ -63,7 +63,7 @@ AUTO PARTITION BY RANGE(date_trunc(`log_time`, 'day'))
 DISTRIBUTED BY RANDOM BUCKETS 10;
 ```
 
-### 实时看板与 Upsert（CDC）
+#### 实时看板与 Upsert（CDC）
 
 ```sql
 CREATE TABLE user_profiles
@@ -78,7 +78,7 @@ UNIQUE KEY(user_id)
 DISTRIBUTED BY HASH(user_id) BUCKETS 10;
 ```
 
-### 指标聚合
+#### 指标聚合
 
 ```sql
 CREATE TABLE site_metrics
@@ -107,7 +107,9 @@ DISTRIBUTED BY HASH(site_id) BUCKETS 10;
 
 ### 查询
 
-- **数据倾斜。**通过 `SHOW TABLETS` 检查 tablet 大小。差异明显时切换为 Random 分桶或选择基数更高的分桶列。
-- **排序键顺序不当。**参见 [Sort Key（排序键）](#sort-key排序键)。
+- **避免数据倾斜。**通过 `SHOW TABLETS` 检查 tablet 大小。差异明显时切换为 Random 分桶或选择基数更高的分桶列。
+- **不要分桶过多。**过多的小 tablet 会产生调度开销，查询性能最多可下降 50%。参见[分桶](#分桶)了解分桶数选择。
+- **不要分桶过少。**过少的 tablet 会限制 CPU 并行度。参见[分桶](#分桶)了解分桶数选择。
+- **正确设置排序键。**与 PostgreSQL 等系统不同，Doris 仅对排序键的前 36 字节建立索引，且遇到 VARCHAR 
会立即截断。超出前缀范围的列无法从排序键受益，需添加[倒排索引](../table-design/index/inverted-index/overview)。参见
 [Sort Key（排序键）](#sort-key排序键)。
 
 诊断慢查询请使用 [Query Profile](../query-acceleration/query-profile)。


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(doris-website) branch master updated: [docs] Improve POC guide structure and query pitfalls (#3469)

Reply via email to