[ 
https://issues.apache.org/jira/browse/KUDU-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17567360#comment-17567360
 ] 

ASF subversion and git services commented on KUDU-2671:
-------------------------------------------------------

Commit dc4031f693382df08c0fab1d0c5ac6bc3c203c35 in kudu's branch 
refs/heads/master from Alexey Serbin
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=dc4031f69 ]

KUDU-2671 check for duplicate columns in hash schema when adding range

This patch adds validation for range-specific hash schemas when
adding new range partitions.  Without this patch, invalid requests
to add range partitions with duplicate columns across dimensions
of the custom hash schema would be accepted, but tablets could not be
created, resulting in timed-out IsAlterTableDone() RPC.  The patch also
contains new test scenarios for both C++ and Java Kudu clients, making
sure the corresponding error is reported back to the client.  I verified
that the new test scenarios failed as expected if commenting out
the newly added hash schema validation code.

This patch also fixes a few typos in test scenarios from master-test.cc
since the new verification code exposed those mistakes.

Change-Id: Iefe6a97028ae12585ac5496ac8608448ffacd95e
Reviewed-on: http://gerrit.cloudera.org:8080/18728
Reviewed-by: Mahesh Reddy <mre...@cloudera.com>
Tested-by: Alexey Serbin <ale...@apache.org>
Reviewed-by: Abhishek Chennaka <achenn...@cloudera.com>
Reviewed-by: Alexey Serbin <ale...@apache.org>


> Change hash number for range partitioning
> -----------------------------------------
>
>                 Key: KUDU-2671
>                 URL: https://issues.apache.org/jira/browse/KUDU-2671
>             Project: Kudu
>          Issue Type: Improvement
>          Components: client, java, master, server
>    Affects Versions: 1.8.0
>            Reporter: yangz
>            Assignee: Mahesh Reddy
>            Priority: Major
>              Labels: feature, roadmap-candidate, scalability
>         Attachments: 屏幕快照 2019-01-24 下午12.03.41.png
>
>
> For our usage, the kudu schema design isn't flexible enough.
> We create our table for day range such as dt='20181112' as hive table.
> But our data size change a lot every day, for one day it will be 50G, but for 
> some other day it will be 500G. For this case, it be hard to set the hash 
> schema. If too big, for most case, it will be too wasteful. But too small, 
> there is a performance problem in the case of a large amount of data.
>  
> So we suggest a solution we can change the hash number by the history data of 
> a table.
> for example
>  # we create schema with one estimated value.
>  # we collect the data size by day range
>  # we create new day range partition by our collected day size.
> We use this feature for half a year, and it work well. We hope this feature 
> will be useful for the community. Maybe the solution isn't so complete. 
> Please help us make it better.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to