[ 
https://issues.apache.org/jira/browse/KUDU-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393488#comment-17393488
 ] 

ASF subversion and git services commented on KUDU-2671:
-------------------------------------------------------

Commit bca9d73f90d02209e67615c140cd9c5311a6d8fb in kudu's branch 
refs/heads/master from Mahesh Reddy
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=bca9d73 ]

[pruning] KUDU-2671: Pruning compatible with custom hash schemas.

This patch introduces changes to the PartitionPruner class
to be compatible with custom hash bucket schemas per range.

There are three ways to set bounds on a scan.

- Adding predicates (e.g. range/equality)
- Setting lower and upper bound primary keys
- Setting lower and upper bound partition keys

This patch introduces changes that make the first two methods
of setting bounds on a scan compatible with custom hash bucket
schemas per range. The last way using partition keys is unstable
and for internal use only. While it's not necessary for the last
way to be compatible with per range hash bucket schemas, the
entire pruning functionality will not be complete until
PartitionPruner::RemovePartitionKeyRange() is modified.
That work will be done in a follow up patch.

Change-Id: I05c37495430f61a2c6f6012c72251138aee465b7
Reviewed-on: http://gerrit.cloudera.org:8080/17643
Reviewed-by: Alexey Serbin <aser...@cloudera.com>
Tested-by: Kudu Jenkins
Reviewed-by: Andrew Wong <aw...@cloudera.com>


> Change hash number for range partitioning
> -----------------------------------------
>
>                 Key: KUDU-2671
>                 URL: https://issues.apache.org/jira/browse/KUDU-2671
>             Project: Kudu
>          Issue Type: Improvement
>          Components: client, java, master, server
>    Affects Versions: 1.8.0
>            Reporter: yangz
>            Assignee: Mahesh Reddy
>            Priority: Major
>              Labels: feature, roadmap-candidate, scalability
>         Attachments: 屏幕快照 2019-01-24 下午12.03.41.png
>
>
> For our usage, the kudu schema design isn't flexible enough.
> We create our table for day range such as dt='20181112' as hive table.
> But our data size change a lot every day, for one day it will be 50G, but for 
> some other day it will be 500G. For this case, it be hard to set the hash 
> schema. If too big, for most case, it will be too wasteful. But too small, 
> there is a performance problem in the case of a large amount of data.
>  
> So we suggest a solution we can change the hash number by the history data of 
> a table.
> for example
>  # we create schema with one estimated value.
>  # we collect the data size by day range
>  # we create new day range partition by our collected day size.
> We use this feature for half a year, and it work well. We hope this feature 
> will be useful for the community. Maybe the solution isn't so complete. 
> Please help us make it better.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to