[jira] [Commented] (KUDU-2671) Change hash number for range partitioning

ASF subversion and git services (Jira) Mon, 30 Aug 2021 18:25:10 -0700


    [ 
https://issues.apache.org/jira/browse/KUDU-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17407009#comment-17407009
 ]


ASF subversion and git services commented on KUDU-2671:
-------------------------------------------------------

Commit 1e8376f4397a24b981216e88d8d4deb8ab154a1d in kudu's branch 
refs/heads/master from Alexey Serbin
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=1e8376f ]

KUDU-2671 update range partitioning with custom hash schema

This patch updates already existing, but not yet released (so we should
not be concerned about the backward compatibility) protobuf data
structures used to create Kudu tables with custom hash partitioning per
range.  With this patch, there is no need to have two separate arrays
of ranges and their hash schemas, requiring them to be of the same size.

I also renamed the 'hash_bucket_schemas' field into 'hash_schema' in
the PartitionSchemaPB data structure.

Change-Id: I37aae56a33170894f30d6cd73a5698d6cbb7a697
Reviewed-on: http://gerrit.cloudera.org:8080/17779
Reviewed-by: Andrew Wong <[email protected]>
Tested-by: Kudu Jenkins
Reviewed-by: Mahesh Reddy <[email protected]>


> Change hash number for range partitioning
> -----------------------------------------
>
>                 Key: KUDU-2671
>                 URL: https://issues.apache.org/jira/browse/KUDU-2671
>             Project: Kudu
>          Issue Type: Improvement
>          Components: client, java, master, server
>    Affects Versions: 1.8.0
>            Reporter: yangz
>            Assignee: Mahesh Reddy
>            Priority: Major
>              Labels: feature, roadmap-candidate, scalability
>         Attachments: 屏幕快照 2019-01-24 下午12.03.41.png
>
>
> For our usage, the kudu schema design isn't flexible enough.
> We create our table for day range such as dt='20181112' as hive table.
> But our data size change a lot every day, for one day it will be 50G， but for 
> some other day it will be 500G. For this case, it be hard to set the hash 
> schema. If too big, for most case, it will be too wasteful. But too small, 
> there is a performance problem in the case of a large amount of data.
>  
> So we suggest a solution we can change the hash number by the history data of 
> a table.
> for example
>  # we create schema with one estimated value.
>  # we collect the data size by day range
>  # we create new day range partition by our collected day size.
> We use this feature for half a year, and it work well. We hope this feature 
> will be useful for the community. Maybe the solution isn't so complete. 
> Please help us make it better.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KUDU-2671) Change hash number for range partitioning

Reply via email to