[ https://issues.apache.org/jira/browse/FLINK-27696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yu Li updated FLINK-27696: -------------------------- Component/s: Table Store > Add bin-pack strategy to split the whole bucket data files into several small > splits > ------------------------------------------------------------------------------------ > > Key: FLINK-27696 > URL: https://issues.apache.org/jira/browse/FLINK-27696 > Project: Flink > Issue Type: Sub-task > Components: Table Store > Reporter: Zheng Hu > Assignee: Jingsong Lee > Priority: Major > Labels: pull-request-available > Fix For: table-store-0.2.0 > > > We don't have to assign each task with a whole bucket data files. Instead, we > can use some algorithm ( such as bin-packing) to split the whole bucket data > files into multiple fragments to improve the job parallelism. > For merge tree table: > Suppose now there are files: [1, 2] [3, 4] [5, 180] [5, 190] [200, 600] [210, > 700] > Files without intersection are not related, we do not need to put all files > into one split, we can slice into multiple splits, multiple parallelism > execution is faster. Nor can we slice too fine, we should make each split as > large as possible with 128 MB, so use BinPack to slice, the final result will > be: > * split1: [1, 2] [3, 4] > * split2: [5, 180] [5, 190] > * split3: [200, 600] [210, 700] -- This message was sent by Atlassian Jira (v8.20.10#820010)