[jira] [Commented] (KUDU-2671) Change hash number for range partitioning

ASF subversion and git services (Jira) Fri, 01 Jul 2022 08:36:25 -0700


    [ 
https://issues.apache.org/jira/browse/KUDU-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17561505#comment-17561505
 ]


ASF subversion and git services commented on KUDU-2671:
-------------------------------------------------------

Commit 76f475abad0194464fd2e46383be7467f50aedd3 in kudu's branch 
refs/heads/master from Alexey Serbin
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=76f475aba ]

KUDU-2671 introduce RANGE_SPECIFIC_HASH_SCHEMA feature flag

This patch introduces a new RANGE_SPECIFIC_HASH_SCHEMA flag for master
to signal that a Kudu cluster is able to work with tables having
range-specific hash schemas (a.k.a. custom hash schemas per range).

In addition, now C++ client requires the new flag to be present at
the server side when creating a table having at least one range
partition with custom hash schema or when adding a new range partition
with custom hash schema.

The rationale for introducing the flag is the following: if there were
no RANGE_SPECIFIC_HASH_SCHEMA flag and a newer client were not requiring
the server to have such a flag, the client would not get an error while
trying to perform the following operations against tablet servers
of prior versions:
  * Creating a table having a range partition with custom hash schema
  * Adding a range partition with custom hash schema to existing table
That's because the information on custom hash schemas is provided via
newly added flags in corresponding protobuf structures, and the old
server would simply ignore the fields, assuming all the ranges to be
created have the table-wide hash schema.

A follow-up patch will add similar functionality for Kudu Java client.

Change-Id: I256d32003e869939e7aa581b21bbe1e77c1e3aba
Reviewed-on: http://gerrit.cloudera.org:8080/18633
Reviewed-by: Mahesh Reddy <[email protected]>
Reviewed-by: Abhishek Chennaka <[email protected]>
Tested-by: Alexey Serbin <[email protected]>
Reviewed-by: Attila Bukor <[email protected]>


> Change hash number for range partitioning
> -----------------------------------------
>
>                 Key: KUDU-2671
>                 URL: https://issues.apache.org/jira/browse/KUDU-2671
>             Project: Kudu
>          Issue Type: Improvement
>          Components: client, java, master, server
>    Affects Versions: 1.8.0
>            Reporter: yangz
>            Assignee: Mahesh Reddy
>            Priority: Major
>              Labels: feature, roadmap-candidate, scalability
>         Attachments: 屏幕快照 2019-01-24 下午12.03.41.png
>
>
> For our usage, the kudu schema design isn't flexible enough.
> We create our table for day range such as dt='20181112' as hive table.
> But our data size change a lot every day, for one day it will be 50G， but for 
> some other day it will be 500G. For this case, it be hard to set the hash 
> schema. If too big, for most case, it will be too wasteful. But too small, 
> there is a performance problem in the case of a large amount of data.
>  
> So we suggest a solution we can change the hash number by the history data of 
> a table.
> for example
>  # we create schema with one estimated value.
>  # we collect the data size by day range
>  # we create new day range partition by our collected day size.
> We use this feature for half a year, and it work well. We hope this feature 
> will be useful for the community. Maybe the solution isn't so complete. 
> Please help us make it better.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KUDU-2671) Change hash number for range partitioning

Reply via email to