[ 
https://issues.apache.org/jira/browse/KUDU-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258714#comment-17258714
 ] 

ASF subversion and git services commented on KUDU-2671:
-------------------------------------------------------

Commit 23ab89db1a1546d93fc3848052d076a9026b3068 in kudu's branch 
refs/heads/master from Mahesh Reddy
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=23ab89d ]

[master] KUDU-2671: Range specific hashing at table creation time.

This patch updates CreateTableRequestPB to allow different
hash schemas to be defined per range at table creation time.
This new field is appropriately decoded in catalog_manager.cc.

While this patch handles the logic for creating the correct
partitions, it does not update the metadata for either the
table or tablets. The new per-range schemas will need to be
added to the table metadata in a following patch.

The changes to kudu/common include some refactoring and
putting functions back into an anonymous namespace.

Change-Id: I8f0dcbc3324f8f2d6e99b4d169fdf5c7f7dff95d
Reviewed-on: http://gerrit.cloudera.org:8080/16859
Reviewed-by: Andrew Wong <aw...@cloudera.com>
Tested-by: Andrew Wong <aw...@cloudera.com>


> Change hash number for range partitioning
> -----------------------------------------
>
>                 Key: KUDU-2671
>                 URL: https://issues.apache.org/jira/browse/KUDU-2671
>             Project: Kudu
>          Issue Type: Improvement
>          Components: client, java, master, server
>    Affects Versions: 1.8.0
>            Reporter: yangz
>            Assignee: Mahesh Reddy
>            Priority: Major
>              Labels: feature, roadmap-candidate, scalability
>         Attachments: 屏幕快照 2019-01-24 下午12.03.41.png
>
>
> For our usage, the kudu schema design isn't flexible enough.
> We create our table for day range such as dt='20181112' as hive table.
> But our data size change a lot every day, for one day it will be 50G, but for 
> some other day it will be 500G. For this case, it be hard to set the hash 
> schema. If too big, for most case, it will be too wasteful. But too small, 
> there is a performance problem in the case of a large amount of data.
>  
> So we suggest a solution we can change the hash number by the history data of 
> a table.
> for example
>  # we create schema with one estimated value.
>  # we collect the data size by day range
>  # we create new day range partition by our collected day size.
> We use this feature for half a year, and it work well. We hope this feature 
> will be useful for the community. Maybe the solution isn't so complete. 
> Please help us make it better.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to