[ https://issues.apache.org/jira/browse/KUDU-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17561505#comment-17561505 ]
ASF subversion and git services commented on KUDU-2671: ------------------------------------------------------- Commit 76f475abad0194464fd2e46383be7467f50aedd3 in kudu's branch refs/heads/master from Alexey Serbin [ https://gitbox.apache.org/repos/asf?p=kudu.git;h=76f475aba ] KUDU-2671 introduce RANGE_SPECIFIC_HASH_SCHEMA feature flag This patch introduces a new RANGE_SPECIFIC_HASH_SCHEMA flag for master to signal that a Kudu cluster is able to work with tables having range-specific hash schemas (a.k.a. custom hash schemas per range). In addition, now C++ client requires the new flag to be present at the server side when creating a table having at least one range partition with custom hash schema or when adding a new range partition with custom hash schema. The rationale for introducing the flag is the following: if there were no RANGE_SPECIFIC_HASH_SCHEMA flag and a newer client were not requiring the server to have such a flag, the client would not get an error while trying to perform the following operations against tablet servers of prior versions: * Creating a table having a range partition with custom hash schema * Adding a range partition with custom hash schema to existing table That's because the information on custom hash schemas is provided via newly added flags in corresponding protobuf structures, and the old server would simply ignore the fields, assuming all the ranges to be created have the table-wide hash schema. A follow-up patch will add similar functionality for Kudu Java client. Change-Id: I256d32003e869939e7aa581b21bbe1e77c1e3aba Reviewed-on: http://gerrit.cloudera.org:8080/18633 Reviewed-by: Mahesh Reddy <mre...@cloudera.com> Reviewed-by: Abhishek Chennaka <achenn...@cloudera.com> Tested-by: Alexey Serbin <ale...@apache.org> Reviewed-by: Attila Bukor <abu...@apache.org> > Change hash number for range partitioning > ----------------------------------------- > > Key: KUDU-2671 > URL: https://issues.apache.org/jira/browse/KUDU-2671 > Project: Kudu > Issue Type: Improvement > Components: client, java, master, server > Affects Versions: 1.8.0 > Reporter: yangz > Assignee: Mahesh Reddy > Priority: Major > Labels: feature, roadmap-candidate, scalability > Attachments: 屏幕快照 2019-01-24 下午12.03.41.png > > > For our usage, the kudu schema design isn't flexible enough. > We create our table for day range such as dt='20181112' as hive table. > But our data size change a lot every day, for one day it will be 50G, but for > some other day it will be 500G. For this case, it be hard to set the hash > schema. If too big, for most case, it will be too wasteful. But too small, > there is a performance problem in the case of a large amount of data. > > So we suggest a solution we can change the hash number by the history data of > a table. > for example > # we create schema with one estimated value. > # we collect the data size by day range > # we create new day range partition by our collected day size. > We use this feature for half a year, and it work well. We hope this feature > will be useful for the community. Maybe the solution isn't so complete. > Please help us make it better. -- This message was sent by Atlassian Jira (v8.20.10#820010)