Hi, Ling Miao.

Thanks for your advice.
I'll think about it and get back to you.

Jianliang Qi

On Mon, Feb 22, 2021 at 5:31 PM ling miao <lingm...@apache.org> wrote:

> Hi JianLiang,
>
> Thank you for your proposal, I think this function is still necessary for
> some large dimension tables.
> This means that data that is not generated according to time can also be
> partitioned.
>
> Of course, since this is a change to metadata, all loads, queries, and
> other DDL operations may need to be changed and developed.
> Please be considerate when designing.
>
> Ling Miao
>
> ye qi <jianliang5...@gmail.com> 于2021年2月21日周日 上午1:12写道:
>
>> List partition
>>
>> Doris currently only supports Range partitioning, where data is usually
>> partitioned by time columns.
>>
>> However, in some scenarios, users want to partition by some enumerated
>> values of columns, such as by city, etc.
>> Design
>>
>> To add support for List partitioning, the following functional points need
>> to be considered.
>>
>>    1. Support for List partition syntax in creating table statements.
>>    2. Support for adding and deleting List partition syntax.
>>    3. Support for List partitioning in various load operations.
>>    4. Support for List partition pruning during query.
>>
>> List partitioned tables do not need to consider dynamic partitioning.
>> Detailed designSyntax
>>
>> The main changes involved here include.
>>
>>    1. Implementation of the subclass ListPartitionDesc of the parsing
>> class
>>    PartitionDesc
>>    2. Implementation of metadata class PartitionInfo subclass
>>    ListPartitionInfo
>>    3. Support for parsing and checking ListPartitionDesc in
>> CreateTableStmt
>>    4. Support for the creation of List Partition tables in Catalog class.
>>    5. Metadata persistence-related changes.
>>
>> The syntax is referenced from MySQL and Oracle
>> Single partition column
>>
>> CREATE TABLE tb1 (
>>     k1 int, k2 varchar(128), k3 int, v1 int, v2 int
>> )
>> PARTITION BY LIST(k1)
>> (
>>     PARTITION p1 VALUES IN ("1", "3", "5"),
>>     PARTITION p2 VALUES IN ("2", "4", "6"),
>>     ...
>> )
>> ...
>> ;
>>
>> Multi-partition columns
>>
>> CREATE TABLE tb2 (
>>     k1 int, k2 varchar(128), k3 int, v1 int, v2 int
>> )
>> PARTITION BY LIST(k1, k2)
>> (
>>     PARTITION p1 VALUES IN (("1", "beijing"), ("1", "shanghai")),
>>     PARTITION p2 VALUES IN (("2", "beijing"), ("2", "shanghai"), ("2",
>> "tianjin")),
>>     PARTITION p3 VALUES IN (("3", "beijing")),
>>     ...
>> )
>> ...
>> ;
>>
>> NOTE: Each partition needs to ensure that the partition values are unique.
>> Add partition
>>
>> ALTER TABLE tb1 ADD PARTITION p4 VALUES IN ("7", "8", "9");
>> ALTER TABLE tb2 ADD PARTITION p4 VALUES IN (("4", "tianjin"));
>>
>> Load
>>
>> The current load methods of Doris include Stream Load, INSERT, Routine
>> Load, Broker Load, Hadoop Load, Spark Load.
>>
>> Among them, Stream Load, INSERT, Routine Load, and Broker Load all use
>> TabletSink class for data distribution. Our first phase supports List
>> partition support for these load operations.
>>
>> The main changes involved include:
>>
>>    1. Changes related to the Descriptors.TOlapTablePartitionParam
>> structure
>>    in the Thrift structure TOlapTableSink
>>    2. Changes related to the OlapTablePartition object in the
>> OlapTableSink
>>    class on the BE side.
>>
>> Query
>>
>> The query mainly needs to implement the List Partition pruning function.
>>
>> The main changes involved include:
>>
>>    1. Implementing the subclass ListPartitionPruner of PartitionPruner
>>
>> Partition related
>>
>> Support operations related to partitioned tables, such as recover,
>> truncate, temporary partition, restore, replace, etc.
>>
>

Reply via email to