[jira] [Commented] (KUDU-2671) Change hash number for range partitioning

ASF subversion and git services (Jira) Wed, 14 Apr 2021 12:02:07 -0700


    [ 
https://issues.apache.org/jira/browse/KUDU-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17321257#comment-17321257
 ]


ASF subversion and git services commented on KUDU-2671:
-------------------------------------------------------

Commit 2dda869456659a36247eb89f5b9e5e3837e5f8a3 in kudu's branch 
refs/heads/master from Mahesh Reddy
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=2dda869 ]

KUDU-2671: Adds new field to PartitionSchema.

This patch introduces a new field to PartitionSchema that combines range bounds
and their respective hash bucket schemas. Any instance that assumes
the same hash schemas are used for each range will need this new field.
Some of the more important instances include partition pruning and many
of the internal PartitionSchema functions.

I moved RowOperationsPB to a separate .proto file due to some circular
dependency issues between common.proto and wire_protocol.proto. Most of
the proto changes in this patch revolve around this change.

Change-Id: Ic5d8615ab9967fdb40292b9c77eb68a19baeca1d
Reviewed-on: http://gerrit.cloudera.org:8080/17025
Tested-by: Kudu Jenkins
Reviewed-by: Andrew Wong <[email protected]>


> Change hash number for range partitioning
> -----------------------------------------
>
>                 Key: KUDU-2671
>                 URL: https://issues.apache.org/jira/browse/KUDU-2671
>             Project: Kudu
>          Issue Type: Improvement
>          Components: client, java, master, server
>    Affects Versions: 1.8.0
>            Reporter: yangz
>            Assignee: Mahesh Reddy
>            Priority: Major
>              Labels: feature, roadmap-candidate, scalability
>         Attachments: 屏幕快照 2019-01-24 下午12.03.41.png
>
>
> For our usage, the kudu schema design isn't flexible enough.
> We create our table for day range such as dt='20181112' as hive table.
> But our data size change a lot every day, for one day it will be 50G， but for 
> some other day it will be 500G. For this case, it be hard to set the hash 
> schema. If too big, for most case, it will be too wasteful. But too small, 
> there is a performance problem in the case of a large amount of data.
>  
> So we suggest a solution we can change the hash number by the history data of 
> a table.
> for example
>  # we create schema with one estimated value.
>  # we collect the data size by day range
>  # we create new day range partition by our collected day size.
> We use this feature for half a year, and it work well. We hope this feature 
> will be useful for the community. Maybe the solution isn't so complete. 
> Please help us make it better.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KUDU-2671) Change hash number for range partitioning

Reply via email to