This is an automated email from the ASF dual-hosted git repository. diwu pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris.git
The following commit(s) were added to refs/heads/master by this push: new 06806dc737 [typo](doc) fix bucket description and style (#20922) 06806dc737 is described below commit 06806dc73764af78f3560758d0f89c11835a9321 Author: gnehil <adamlee...@gmail.com> AuthorDate: Sat Jun 17 21:56:00 2023 +0800 [typo](doc) fix bucket description and style (#20922) --- docs/en/docs/data-table/data-partition.md | 51 +++++++++++++------------- docs/zh-CN/docs/data-table/data-partition.md | 54 ++++++++++++++-------------- 2 files changed, 54 insertions(+), 51 deletions(-) diff --git a/docs/en/docs/data-table/data-partition.md b/docs/en/docs/data-table/data-partition.md index 324d979baa..19076a7393 100644 --- a/docs/en/docs/data-table/data-partition.md +++ b/docs/en/docs/data-table/data-partition.md @@ -139,7 +139,7 @@ A few suggested rules for defining columns include: ### Partitioning and Bucketing -Doris supports two layers of data partitioning. The first level is Partition, including range partitioning and list partitioning. The second is Bucket (Tablet), which only supports hash partitioning. +Doris supports two layers of data partitioning. The first level is Partition, including range partitioning and list partitioning. The second is Bucket (Tablet), including hash and random partitioning. It is also possible to use one layer of data partitioning, If you do not write the partition statement when creating the table, Doris will generate a default partition at this time, which is transparent to the user. In this case, it only supports data bucketing. @@ -232,26 +232,27 @@ It is also possible to use one layer of data partitioning, If you do not write t In the above example, we specify `date` (DATE type) and `id` (INT type) as the partitioning columns, so the resulting partitions will be as follows: - ``` - *p201701_1000: [(MIN_VALUE, MIN_VALUE), ("2017-02-01", "1000") ) - *p201702_2000: [("2017-02-01", "1000"), ("2017-03-01", "2000") ) - *p201703_all: [("2017-03-01", "2000"), ("2017-04-01", MIN_VALUE)) + ``` text + *p201701_1000: [(MIN_VALUE, MIN_VALUE), ("2017-02-01", "1000") ) + *p201702_2000: [("2017-02-01", "1000"), ("2017-03-01", "2000") ) + *p201703_all: [("2017-03-01", "2000"), ("2017-04-01", MIN_VALUE)) ``` Note that in the last partition, the user only specifies the partition value of the `date` column, so the system fills in `MIN_VALUE` as the partition value of the `id` column by default. When data are imported, the system will compare them with the partition values in order, and put the data in their corresponding partitions. Examples are as follows: + ``` text + * Data --> Partition + * 2017-01-01, 200 --> p201701_1000 + * 2017-01-01, 2000 --> p201701_1000 + * 2017-02-01, 100 --> p201701_1000 + * 2017-02-01, 2000 --> p201702_2000 + * 2017-02-15, 5000 --> p201702_2000 + * 2017-03-01, 2000 --> p201703_all + * 2017-03-10, 1 --> p201703_all + * 2017-04-01, 1000 --> Unable to import + * 2017-05-01, 1000 --> Unable to import ``` - * Data --> Partition - * 2017-01-01, 200 --> p201701_1000 - * 2017-01-01, 2000 --> p201701_1000 - * 2017-02-01, 100 --> p201701_1000 - * 2017-02-01, 2000 --> p201702_2000 - * 2017-02-15, 5000 --> p201702_2000 - * 2017-03-01, 2000 --> p201703_all - * 2017-03-10, 1 --> p201703_all - * 2017-04-01, 1000 --> Unable to import - * 2017-05-01, 1000 --> Unable to import - ``` + <version since="1.2.0"> @@ -275,7 +276,7 @@ Range partitioning also supports batch partitioning. For example, you can create * As in the `example_list_tbl` example above, when the table is created, the following three partitions are automatically created. - ``` + ```text p_cn: ("Beijing", "Shanghai", "Hong Kong") p_usa: ("New York", "San Francisco") p_jp: ("Tokyo") @@ -284,7 +285,7 @@ Range partitioning also supports batch partitioning. For example, you can create * If we add Partition p_uk VALUES IN ("London"), the results will be as follows: - ``` + ```text p_cn: ("Beijing", "Shanghai", "Hong Kong") p_usa: ("New York", "San Francisco") p_jp: ("Tokyo") @@ -293,7 +294,7 @@ Range partitioning also supports batch partitioning. For example, you can create * Now we delete Partition p_jp, the results will be as follows: - ``` + ```text p_cn: ("Beijing", "Shanghai", "Hong Kong") p_usa: ("New York", "San Francisco") p_uk: ("London") @@ -301,7 +302,7 @@ Range partitioning also supports batch partitioning. For example, you can create List partitioning also supports **multi-column partitioning**. Examples are as follows: - ``` + ```text PARTITION BY LIST(`id`, `city`) ( PARTITION `p1_city` VALUES IN (("1", "Beijing"), ("1", "Shanghai")), @@ -312,14 +313,14 @@ Range partitioning also supports batch partitioning. For example, you can create In the above example, we specify `id` (INT type) and `city` (VARCHAR type) as the partitioning columns, so the resulting partitions will be as follows: - ``` - * p1_city: [("1", "Beijing"), ("1", "Shanghai")] - * p2_city: [("2", "Beijing"), ("2", "Shanghai")] - * p3_city: [("3", "Beijing"), ("3", "Shanghai")] + ```text + * p1_city: [("1", "Beijing"), ("1", "Shanghai")] + * p2_city: [("2", "Beijing"), ("2", "Shanghai")] + * p3_city: [("3", "Beijing"), ("3", "Shanghai")] ``` When data are imported, the system will compare them with the partition values in order, and put the data in their corresponding partitions. Examples are as follows: - ``` + ```text Data ---> Partition 1, Beijing ---> p1_city 1, Shanghai ---> p1_city diff --git a/docs/zh-CN/docs/data-table/data-partition.md b/docs/zh-CN/docs/data-table/data-partition.md index 68fcfe02fd..03a570821c 100644 --- a/docs/zh-CN/docs/data-table/data-partition.md +++ b/docs/zh-CN/docs/data-table/data-partition.md @@ -141,7 +141,7 @@ AGGREGATE KEY 数据模型中,所有没有指定聚合方式(SUM、REPLACE ### 分区和分桶 -Doris 支持两层的数据划分。第一层是 Partition,支持 Range 和 List 的划分方式。第二层是 Bucket(Tablet),仅支持 Hash 的划分方式。 +Doris 支持两层的数据划分。第一层是 Partition,支持 Range 和 List 的划分方式。第二层是 Bucket(Tablet),支持 Hash 和 Random 的划分方式。 也可以仅使用一层分区,建表时如果不写分区的语句即可,此时Doris会生成一个默认的分区,对用户是透明的。使用一层分区时,只支持 Bucket 划分。下面我们来分别介绍下分区以及分桶: @@ -166,6 +166,7 @@ Doris 支持两层的数据划分。第一层是 Partition,支持 Range 和 Li - 同时,也支持通过`FROM(...) TO (...) INTERVAL ...` 来批量创建分区。 </version> + - 通过 `VALUES [...)` 同时指定上下界比较容易理解。这里举例说明,当使用 `VALUES LESS THAN (...)` 语句进行分区的增删操作时,分区范围的变化情况: @@ -240,26 +241,27 @@ Doris 支持两层的数据划分。第一层是 Partition,支持 Range 和 Li 在以上示例中,我们指定 `date`(DATE 类型) 和 `id`(INT 类型) 作为分区列。以上示例最终得到的分区如下: - ``` - * p201701_1000: [(MIN_VALUE, MIN_VALUE), ("2017-02-01", "1000") ) - * p201702_2000: [("2017-02-01", "1000"), ("2017-03-01", "2000") ) - * p201703_all: [("2017-03-01", "2000"), ("2017-04-01", MIN_VALUE)) + ```text + * p201701_1000: [(MIN_VALUE, MIN_VALUE), ("2017-02-01", "1000") ) + * p201702_2000: [("2017-02-01", "1000"), ("2017-03-01", "2000") ) + * p201703_all: [("2017-03-01", "2000"), ("2017-04-01", MIN_VALUE)) ``` 注意,最后一个分区用户缺省只指定了 `date` 列的分区值,所以 `id` 列的分区值会默认填充 `MIN_VALUE`。当用户插入数据时,分区列值会按照顺序依次比较,最终得到对应的分区。举例如下: + ``` text + * 数据 --> 分区 + * 2017-01-01, 200 --> p201701_1000 + * 2017-01-01, 2000 --> p201701_1000 + * 2017-02-01, 100 --> p201701_1000 + * 2017-02-01, 2000 --> p201702_2000 + * 2017-02-15, 5000 --> p201702_2000 + * 2017-03-01, 2000 --> p201703_all + * 2017-03-10, 1 --> p201703_all + * 2017-04-01, 1000 --> 无法导入 + * 2017-05-01, 1000 --> 无法导入 ``` - * 数据 --> 分区 - * 2017-01-01, 200 --> p201701_1000 - * 2017-01-01, 2000 --> p201701_1000 - * 2017-02-01, 100 --> p201701_1000 - * 2017-02-01, 2000 --> p201702_2000 - * 2017-02-15, 5000 --> p201702_2000 - * 2017-03-01, 2000 --> p201703_all - * 2017-03-10, 1 --> p201703_all - * 2017-04-01, 1000 --> 无法导入 - * 2017-05-01, 1000 --> 无法导入 - ``` + <version since="1.2.0"> Range分区同样支持**批量分区**, 通过语句 `FROM ("2022-01-03") TO ("2022-01-06") INTERVAL 1 DAY` 批量创建按天划分的分区:2022-01-03到2022-01-06(不含2022-01-06日),分区结果如下: @@ -319,21 +321,21 @@ Doris 支持两层的数据划分。第一层是 Partition,支持 Range 和 Li 在以上示例中,我们指定 `id`(INT 类型) 和 `city`(VARCHAR 类型) 作为分区列。以上示例最终得到的分区如下: ```text - * p1_city: [("1", "Beijing"), ("1", "Shanghai")] - * p2_city: [("2", "Beijing"), ("2", "Shanghai")] - * p3_city: [("3", "Beijing"), ("3", "Shanghai")] + * p1_city: [("1", "Beijing"), ("1", "Shanghai")] + * p2_city: [("2", "Beijing"), ("2", "Shanghai")] + * p3_city: [("3", "Beijing"), ("3", "Shanghai")] ``` 当用户插入数据时,分区列值会按照顺序依次比较,最终得到对应的分区。举例如下: ```text - * 数据 ---> 分区 - * 1, Beijing ---> p1_city - * 1, Shanghai ---> p1_city - * 2, Shanghai ---> p2_city - * 3, Beijing ---> p3_city - * 1, Tianjin ---> 无法导入 - * 4, Beijing ---> 无法导入 + * 数据 ---> 分区 + * 1, Beijing ---> p1_city + * 1, Shanghai ---> p1_city + * 2, Shanghai ---> p2_city + * 3, Beijing ---> p3_city + * 1, Tianjin ---> 无法导入 + * 4, Beijing ---> 无法导入 ``` 2. **Bucket** --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org