Lei Yang created FLINK-36576: -------------------------------- Summary: Improving amount-based data balancing distribution algorithm for DefaultVertexParallelismAndInputInfosDecider Key: FLINK-36576 URL: https://issues.apache.org/jira/browse/FLINK-36576 Project: Flink Issue Type: Sub-task Components: Runtime / Coordination Reporter: Lei Yang
Currently, the DefaultVertexParallelismAndInputInfosDecider is able to implement a balanced distribution algorithm based on the amount of data and the number of subpartitions, however it also has some limitations: # Currently, Decider selects the data distribution algorithm via the AllToAll or Pointwise attribute of the input, which limits the ability of the operator to dynamically modify the data distribution algorithm. # Doesn't support data volume-based balanced distribution for Pointwise inputs. # For AllToAll type inputs, it does not support splitting the data corresponding to the specific key, i.e., it cannot solve the data skewing caused by single-key hotspot. For that we plan to introduce the following improvements: # Introducing InterInputsKeyCorrelation and IntraInputKeyCorrelation to the input characterisation which allows the operator to flexibly choose the data balanced distribution algorithm. # Introducing a data volume-based data balanced distribution algorithm for Pointwise inputs # Introducing the ability to split data corresponding to the specific key to optimise AllToAll's data volume-based data balancing distribution algorithm. -- This message was sent by Atlassian Jira (v8.20.10#820010)