[ 
https://issues.apache.org/jira/browse/KUDU-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17561944#comment-17561944
 ] 

ASF subversion and git services commented on KUDU-2671:
-------------------------------------------------------

Commit b746978c71ce4a95b69d49c43d0ac852909a8b4e in kudu's branch 
refs/heads/master from Mahesh Reddy
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=b746978c7 ]

KUDU-2671: Follow up pruning patch.

This patch flattens the result set of the pruner into a one
dimensional container. The new container only contains the
partition key ranges and no longer stores the range bounds.

Currently, full scans using KuduScanner with no predicates are
functional. Scans with range predicates are also functional on
tables with covering ranges as well as on tables with non
covering ranges.

There are a few commented out test cases within
flex_partitioning_client-test. These test cases involve a scan
with range predicates that are both out of bounds. They fail
because the non covering range case is triggered in
scanner_internal and we return early from this function before
the proxy_ is set. Check(proxy_) is where the tests fails in
KuduScanner::NextBatch within client.cc.

Using KuduScanTokens to scan tables with range specific hash schemas
is not yet compatible. A follow up patch should address this deficiency.

The scan token tests with custom hash schemas are failing when
verifying the tablet info. It seems that the data_ field of the
KuduTablets isn't set.

Change-Id: I3a1bf5344c0ef856072d3ed102714dce5ba21060
Reviewed-on: http://gerrit.cloudera.org:8080/17879
Tested-by: Alexey Serbin <ale...@apache.org>
Reviewed-by: Alexey Serbin <ale...@apache.org>


> Change hash number for range partitioning
> -----------------------------------------
>
>                 Key: KUDU-2671
>                 URL: https://issues.apache.org/jira/browse/KUDU-2671
>             Project: Kudu
>          Issue Type: Improvement
>          Components: client, java, master, server
>    Affects Versions: 1.8.0
>            Reporter: yangz
>            Assignee: Mahesh Reddy
>            Priority: Major
>              Labels: feature, roadmap-candidate, scalability
>         Attachments: 屏幕快照 2019-01-24 下午12.03.41.png
>
>
> For our usage, the kudu schema design isn't flexible enough.
> We create our table for day range such as dt='20181112' as hive table.
> But our data size change a lot every day, for one day it will be 50G, but for 
> some other day it will be 500G. For this case, it be hard to set the hash 
> schema. If too big, for most case, it will be too wasteful. But too small, 
> there is a performance problem in the case of a large amount of data.
>  
> So we suggest a solution we can change the hash number by the history data of 
> a table.
> for example
>  # we create schema with one estimated value.
>  # we collect the data size by day range
>  # we create new day range partition by our collected day size.
> We use this feature for half a year, and it work well. We hope this feature 
> will be useful for the community. Maybe the solution isn't so complete. 
> Please help us make it better.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to