Hi, Ling Miao. Thanks for your advice. I'll think about it and get back to you.
Jianliang Qi On Mon, Feb 22, 2021 at 5:31 PM ling miao <lingm...@apache.org> wrote: > Hi JianLiang, > > Thank you for your proposal, I think this function is still necessary for > some large dimension tables. > This means that data that is not generated according to time can also be > partitioned. > > Of course, since this is a change to metadata, all loads, queries, and > other DDL operations may need to be changed and developed. > Please be considerate when designing. > > Ling Miao > > ye qi <jianliang5...@gmail.com> 于2021年2月21日周日 上午1:12写道: > >> List partition >> >> Doris currently only supports Range partitioning, where data is usually >> partitioned by time columns. >> >> However, in some scenarios, users want to partition by some enumerated >> values of columns, such as by city, etc. >> Design >> >> To add support for List partitioning, the following functional points need >> to be considered. >> >> 1. Support for List partition syntax in creating table statements. >> 2. Support for adding and deleting List partition syntax. >> 3. Support for List partitioning in various load operations. >> 4. Support for List partition pruning during query. >> >> List partitioned tables do not need to consider dynamic partitioning. >> Detailed designSyntax >> >> The main changes involved here include. >> >> 1. Implementation of the subclass ListPartitionDesc of the parsing >> class >> PartitionDesc >> 2. Implementation of metadata class PartitionInfo subclass >> ListPartitionInfo >> 3. Support for parsing and checking ListPartitionDesc in >> CreateTableStmt >> 4. Support for the creation of List Partition tables in Catalog class. >> 5. Metadata persistence-related changes. >> >> The syntax is referenced from MySQL and Oracle >> Single partition column >> >> CREATE TABLE tb1 ( >> k1 int, k2 varchar(128), k3 int, v1 int, v2 int >> ) >> PARTITION BY LIST(k1) >> ( >> PARTITION p1 VALUES IN ("1", "3", "5"), >> PARTITION p2 VALUES IN ("2", "4", "6"), >> ... >> ) >> ... >> ; >> >> Multi-partition columns >> >> CREATE TABLE tb2 ( >> k1 int, k2 varchar(128), k3 int, v1 int, v2 int >> ) >> PARTITION BY LIST(k1, k2) >> ( >> PARTITION p1 VALUES IN (("1", "beijing"), ("1", "shanghai")), >> PARTITION p2 VALUES IN (("2", "beijing"), ("2", "shanghai"), ("2", >> "tianjin")), >> PARTITION p3 VALUES IN (("3", "beijing")), >> ... >> ) >> ... >> ; >> >> NOTE: Each partition needs to ensure that the partition values are unique. >> Add partition >> >> ALTER TABLE tb1 ADD PARTITION p4 VALUES IN ("7", "8", "9"); >> ALTER TABLE tb2 ADD PARTITION p4 VALUES IN (("4", "tianjin")); >> >> Load >> >> The current load methods of Doris include Stream Load, INSERT, Routine >> Load, Broker Load, Hadoop Load, Spark Load. >> >> Among them, Stream Load, INSERT, Routine Load, and Broker Load all use >> TabletSink class for data distribution. Our first phase supports List >> partition support for these load operations. >> >> The main changes involved include: >> >> 1. Changes related to the Descriptors.TOlapTablePartitionParam >> structure >> in the Thrift structure TOlapTableSink >> 2. Changes related to the OlapTablePartition object in the >> OlapTableSink >> class on the BE side. >> >> Query >> >> The query mainly needs to implement the List Partition pruning function. >> >> The main changes involved include: >> >> 1. Implementing the subclass ListPartitionPruner of PartitionPruner >> >> Partition related >> >> Support operations related to partitioned tables, such as recover, >> truncate, temporary partition, restore, replace, etc. >> >