[ https://issues.apache.org/jira/browse/FLINK-36576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated FLINK-36576: ----------------------------------- Labels: pull-request-available (was: ) > Improving amount-based data balancing distribution algorithm for > DefaultVertexParallelismAndInputInfosDecider > ------------------------------------------------------------------------------------------------------------- > > Key: FLINK-36576 > URL: https://issues.apache.org/jira/browse/FLINK-36576 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination > Reporter: Lei Yang > Priority: Major > Labels: pull-request-available > > Currently, the DefaultVertexParallelismAndInputInfosDecider is able to > implement a balanced distribution algorithm based on the amount of data and > the number of subpartitions, however it also has some limitations: > # Currently, Decider selects the data distribution algorithm via the > AllToAll or Pointwise attribute of the input, which limits the ability of the > operator to dynamically modify the data distribution algorithm. > # Doesn't support data volume-based balanced distribution for Pointwise > inputs. > # For AllToAll type inputs, it does not support splitting the data > corresponding to the specific key, i.e., it cannot solve the data skewing > caused by single-key hotspot. > For that we plan to introduce the following improvements: > # Introducing InterInputsKeyCorrelation and IntraInputKeyCorrelation to the > input characterisation which allows the operator to flexibly choose the data > balanced distribution algorithm. > # Introducing a data volume-based data balanced distribution algorithm for > Pointwise inputs > # Introducing the ability to split data corresponding to the specific key to > optimise AllToAll's data volume-based data balancing distribution algorithm. -- This message was sent by Atlassian Jira (v8.20.10#820010)