[ https://issues.apache.org/jira/browse/KUDU-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388303#comment-17388303 ]
ASF subversion and git services commented on KUDU-2671: ------------------------------------------------------- Commit 392f43f8233f80d993c9b2bf35dbf43930969ff5 in kudu's branch refs/heads/master from Alexey Serbin [ https://gitbox.apache.org/repos/asf?p=kudu.git;h=392f43f ] KUDU-2671 remove semantically duplicate range_hash_schemas This patch removes the 'range_hash_schemas' field from CreateTableRequestPB because the field was semantically duplicating the 'range_hash_schemas' sub-field of the already existing 'partition_schema' field. This change doesn't break any compatibility because the corresponding client side which uses the removed field hasn't yet been released. In addition, this patch fixes an invalid memory access condition (sometimes leading to SIGSEGV) in PartitionSchema::EncodeRangeBounds() if the number of per range hash schemas is less than the number of range bounds. With the removal of the semantically duplicate field, the check for the validity of per-range hash bucket ranges is now effective. This patch adds a new test scenario to verify that the validation is now in place and to catch regressions in the future. I also updated the corresponding code in the C++ client and tests. This is a follow-up to 23ab89db1 and 586b79132. Change-Id: Icde3d0b0870fd3a3941fcc91602993ae7ad46266 Reviewed-on: http://gerrit.cloudera.org:8080/17694 Reviewed-by: Andrew Wong <aw...@cloudera.com> Tested-by: Kudu Jenkins Reviewed-by: Mahesh Reddy <mre...@cloudera.com> > Change hash number for range partitioning > ----------------------------------------- > > Key: KUDU-2671 > URL: https://issues.apache.org/jira/browse/KUDU-2671 > Project: Kudu > Issue Type: Improvement > Components: client, java, master, server > Affects Versions: 1.8.0 > Reporter: yangz > Assignee: Mahesh Reddy > Priority: Major > Labels: feature, roadmap-candidate, scalability > Attachments: 屏幕快照 2019-01-24 下午12.03.41.png > > > For our usage, the kudu schema design isn't flexible enough. > We create our table for day range such as dt='20181112' as hive table. > But our data size change a lot every day, for one day it will be 50G, but for > some other day it will be 500G. For this case, it be hard to set the hash > schema. If too big, for most case, it will be too wasteful. But too small, > there is a performance problem in the case of a large amount of data. > > So we suggest a solution we can change the hash number by the history data of > a table. > for example > # we create schema with one estimated value. > # we collect the data size by day range > # we create new day range partition by our collected day size. > We use this feature for half a year, and it work well. We hope this feature > will be useful for the community. Maybe the solution isn't so complete. > Please help us make it better. -- This message was sent by Atlassian Jira (v8.3.4#803005)