[ 
https://issues.apache.org/jira/browse/KUDU-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555130#comment-17555130
 ] 

ASF subversion and git services commented on KUDU-2671:
-------------------------------------------------------

Commit 295b4903bc69fabb3cb36f618022d465c91954c7 in kudu's branch 
refs/heads/master from Alexey Serbin
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=295b4903b ]

KUDU-2671 more robust convention on specifying range bounds

This patch updates the code of the catalog manager to adhere to a more
robust convention on specifying the information for the range partition
boundaries when creating a table with custom hash schemas per range.

Prior to this patch, the catalog manager required both the
CreateTableRequestPB::split_rows_range_bounds and the
CreateTableRequestPB::partition_schema::custom_hash_schema_ranges fields
to have the same number of elements, assuming the former had the ranges
exactly corresponding to the latter, where the latter would also had
information on hash schema for each range correspondingly.  In addition
to duplicating the information unnecessarily, that approach was also
a bit brittle from the standpoint of keeping good API practices.

This patch updates the code to use a new convention: if there is at
least one range partition with custom hash schema in CreateTable RPC,
all the information on range boundaries and hash schemas should be
presented only via one field:
CreateTableRequestPB::partition_schema::custom_hash_schema_ranges.
That's better than the previous convention because:
  * it's more robust as explained above
  * it naturally follows the restriction of not allowing the split
    rows along with range partitions with custom hash schemas

Also, I updated already existing tests and added extra test scenarios
to cover the updated functionality.

Change-Id: I14073e72178e6bb85bae719ad377c5bb05f8dd55
Reviewed-on: http://gerrit.cloudera.org:8080/18590
Tested-by: Alexey Serbin <ale...@apache.org>
Reviewed-by: Mahesh Reddy <mre...@cloudera.com>
Reviewed-by: Attila Bukor <abu...@apache.org>


> Change hash number for range partitioning
> -----------------------------------------
>
>                 Key: KUDU-2671
>                 URL: https://issues.apache.org/jira/browse/KUDU-2671
>             Project: Kudu
>          Issue Type: Improvement
>          Components: client, java, master, server
>    Affects Versions: 1.8.0
>            Reporter: yangz
>            Assignee: Mahesh Reddy
>            Priority: Major
>              Labels: feature, roadmap-candidate, scalability
>         Attachments: 屏幕快照 2019-01-24 下午12.03.41.png
>
>
> For our usage, the kudu schema design isn't flexible enough.
> We create our table for day range such as dt='20181112' as hive table.
> But our data size change a lot every day, for one day it will be 50G, but for 
> some other day it will be 500G. For this case, it be hard to set the hash 
> schema. If too big, for most case, it will be too wasteful. But too small, 
> there is a performance problem in the case of a large amount of data.
>  
> So we suggest a solution we can change the hash number by the history data of 
> a table.
> for example
>  # we create schema with one estimated value.
>  # we collect the data size by day range
>  # we create new day range partition by our collected day size.
> We use this feature for half a year, and it work well. We hope this feature 
> will be useful for the community. Maybe the solution isn't so complete. 
> Please help us make it better.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to