[GitHub] [doris] zhannngchen commented on a diff in pull request #13068: [docs](unique-key-merge-on-write) add document for unique key merge o…

GitBox Fri, 30 Sep 2022 01:59:46 -0700


zhannngchen commented on code in PR #13068:
URL: https://github.com/apache/doris/pull/13068#discussion_r984370782



##########
docs/zh-CN/docs/data-table/data-model.md:
##########
@@ -308,7 +310,54 @@ PROPERTIES (
 );
 ```
 
-即 Unique 模型完全可以用聚合模型中的 REPLACE 方式替代。其内部的实现方式和数据存储方式也完全一样。这里不再继续举例说明。
+即Unique 模型的读时合并实现完全可以用聚合模型中的 REPLACE 方式替代。其内部的实现方式和数据存储方式也完全一样。这里不再继续举例说明。
+
+### 写时合并（1.2版本新增）
+
+Unqiue模型的写时合并实现，与聚合模型就是完全不同的两种模型了，查询性能更接近于duplicate模型，在有主键约束需求的场景上相比聚合模型有较大的查询性能优势，尤其是在聚合模型上。
+
+在1.2版本中，作为一个新的feature，写时合并默认关闭，用户可以通过添加下面的property来开启
+
+```
+“enable_unique_key_merge_on_write” = “true”
+```
+
+仍然以上面的表为例，建表语句为
+
+```sql
+CREATE TABLE IF NOT EXISTS example_db.expamle_tbl
+(
+    `user_id` LARGEINT NOT NULL COMMENT "用户id",
+    `username` VARCHAR(50) NOT NULL COMMENT "用户昵称",
+    `city` VARCHAR(20) COMMENT "用户所在城市",
+    `age` SMALLINT COMMENT "用户年龄",
+    `sex` TINYINT COMMENT "用户性别",
+    `phone` LARGEINT COMMENT "用户电话",
+    `address` VARCHAR(500) COMMENT "用户地址",
+    `register_time` DATETIME COMMENT "用户注册时间"
+)
+UNIQUE KEY(`user_id`, `username`)
+DISTRIBUTED BY HASH(`user_id`) BUCKETS 1
+PROPERTIES (
+"replication_allocation" = "tag.location.default: 1"
+"enable_unique_key_merge_on_write" = "true"
+);
+```
+
+使用这种建表语句建出来的表结构，与聚合模型就完全不同了：
+
+| ColumnName    | Type         | AggregationType | Comment      |
+| ------------- | ------------ | --------------- | ------------ |
+| user_id       | BIGINT       |                 | 用户id       |
+| username      | VARCHAR(50)  |                 | 用户昵称     |
+| city          | VARCHAR(20)  | NONE            | 用户所在城市 |
+| age           | SMALLINT     | NONE            | 用户年龄     |
+| sex           | TINYINT      | NONE            | 用户性别     |
+| phone         | LARGEINT     | NONE            | 用户电话     |
+| address       | VARCHAR(500) | NONE            | 用户住址     |
+| register_time | DATETIME     | NONE            | 用户注册时间 |
+
+在开启了写时合并选项的Unique表上，数据在导入阶段就会去将被覆盖和被更新的数据进行标记删除，同时将新的数据写入新的文件。在查询的时候，所有被标记删除的数据都会在文件级别被过滤掉，读取出来的数据就都是最新的数据，消除掉了读时合并中的数据聚合过程，并且能够在很多情况下支持多种谓词的下推。因此在许多场景都能带来比较大的性能提升，尤其是在有聚合查询的情况下。

Review Comment:
   updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

[GitHub] [doris] zhannngchen commented on a diff in pull request #13068: [docs](unique-key-merge-on-write) add document for unique key merge o…

Reply via email to