[jira] [Commented] (KUDU-2671) Change hash number for range partitioning

ASF subversion and git services (Jira) Thu, 30 Jun 2022 11:48:17 -0700


    [ 
https://issues.apache.org/jira/browse/KUDU-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17561145#comment-17561145
 ]


ASF subversion and git services commented on KUDU-2671:
-------------------------------------------------------

Commit 1889d4c44385fec5efeeb2d287d9ab7a3544dcfe in kudu's branch 
refs/heads/master from Abhishek Chennaka
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=1889d4c44 ]

[c++ client] KUDU-2671  Custom hash schema alter table support

This patch adds public methods to C++ client to alter a table and add
a new range partition with custom hash sub-partitioning. We make use
of the KuduTableCreator::KuduRangePartition() for this purpose. The
necessary changes are done in table_alterer-internal classes and
methods to use the above mentioned KuduRangePartition() to store
the table bounds information as well as custom hash schema
information.

Necessary tests are included in this patch which include adding and
dropping the ranges with custom hash schema by altering the table.
We also read and write the data into these partitions.

The pending work in this patch is to rebase on top of
https://gerrit.cloudera.org/#/c/17879/ and include test cases with
scans with predicates on these partitions.

Change-Id: Id4b1e306cca096d9479f06669cc22cc40d77fb42
Reviewed-on: http://gerrit.cloudera.org:8080/18663
Tested-by: Alexey Serbin <[email protected]>
Reviewed-by: Alexey Serbin <[email protected]>


> Change hash number for range partitioning
> -----------------------------------------
>
>                 Key: KUDU-2671
>                 URL: https://issues.apache.org/jira/browse/KUDU-2671
>             Project: Kudu
>          Issue Type: Improvement
>          Components: client, java, master, server
>    Affects Versions: 1.8.0
>            Reporter: yangz
>            Assignee: Mahesh Reddy
>            Priority: Major
>              Labels: feature, roadmap-candidate, scalability
>         Attachments: 屏幕快照 2019-01-24 下午12.03.41.png
>
>
> For our usage, the kudu schema design isn't flexible enough.
> We create our table for day range such as dt='20181112' as hive table.
> But our data size change a lot every day, for one day it will be 50G， but for 
> some other day it will be 500G. For this case, it be hard to set the hash 
> schema. If too big, for most case, it will be too wasteful. But too small, 
> there is a performance problem in the case of a large amount of data.
>  
> So we suggest a solution we can change the hash number by the history data of 
> a table.
> for example
>  # we create schema with one estimated value.
>  # we collect the data size by day range
>  # we create new day range partition by our collected day size.
> We use this feature for half a year, and it work well. We hope this feature 
> will be useful for the community. Maybe the solution isn't so complete. 
> Please help us make it better.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KUDU-2671) Change hash number for range partitioning

Reply via email to