[jira] [Updated] (FLINK-36576) Improving amount-based data balancing distribution algorithm for DefaultVertexParallelismAndInputInfosDecider

Lei Yang (Jira) Mon, 21 Oct 2024 00:46:06 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-36576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Lei Yang updated FLINK-36576:
-----------------------------
    Description: 
Currently, the DefaultVertexParallelismAndInputInfosDecider is able to 
implement a balanced distribution algorithm based on the amount of data and the 
number of subpartitions, however it also has some limitations:
 # Currently, Decider selects the data distribution algorithm via the AllToAll 
or Pointwise attribute of the input, which limits the ability of the operator 
to dynamically modify the data distribution algorithm.
 # Doesn't support data volume-based balanced distribution for Pointwise inputs.
 # For AllToAll type inputs, it does not support splitting the data 
corresponding to the specific key, i.e., it cannot solve the data skewing 
caused by single-key hotspot.

For that we plan to introduce the following improvements:
 # Introducing InterInputsKeyCorrelation and IntraInputKeyCorrelation to the 
input characterisation which allows the operator to flexibly choose the data 
balanced distribution algorithm.
 # Introducing a data volume-based data balanced distribution algorithm for 
Pointwise inputs
 # Introducing the ability to split data corresponding to the specific key to 
optimise AllToAll's data volume-based data balancing distribution algorithm.

  was:
Currently, the DefaultVertexParallelismAndInputInfosDecider is able to 
implement a balanced distribution algorithm based on the amount of data and the 
number of subpartitions, however it also has some limitations:
 # 
Currently, Decider selects the data distribution algorithm via the AllToAll or 
Pointwise attribute of the input, which limits the ability of the operator to 
dynamically modify the data distribution algorithm.
 # 
Doesn't support data volume-based balanced distribution for Pointwise inputs.
 # 
For AllToAll type inputs, it does not support splitting the data corresponding 
to the specific key, i.e., it cannot solve the data skewing caused by 
single-key hotspot.

For that we plan to introduce the following improvements:
 # 
Introducing InterInputsKeyCorrelation and IntraInputKeyCorrelation to the input 
characterisation which allows the operator to flexibly choose the data balanced 
distribution algorithm.
 # 
Introducing a data volume-based data balanced distribution algorithm for 
Pointwise inputs
 # 
Introducing the ability to split data corresponding to the specific key to 
optimise AllToAll's data volume-based data balancing distribution algorithm.


> Improving amount-based data balancing distribution algorithm for 
> DefaultVertexParallelismAndInputInfosDecider
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-36576
>                 URL: https://issues.apache.org/jira/browse/FLINK-36576
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Coordination
>            Reporter: Lei Yang
>            Priority: Major
>
> Currently, the DefaultVertexParallelismAndInputInfosDecider is able to 
> implement a balanced distribution algorithm based on the amount of data and 
> the number of subpartitions, however it also has some limitations:
>  # Currently, Decider selects the data distribution algorithm via the 
> AllToAll or Pointwise attribute of the input, which limits the ability of the 
> operator to dynamically modify the data distribution algorithm.
>  # Doesn't support data volume-based balanced distribution for Pointwise 
> inputs.
>  # For AllToAll type inputs, it does not support splitting the data 
> corresponding to the specific key, i.e., it cannot solve the data skewing 
> caused by single-key hotspot.
> For that we plan to introduce the following improvements:
>  # Introducing InterInputsKeyCorrelation and IntraInputKeyCorrelation to the 
> input characterisation which allows the operator to flexibly choose the data 
> balanced distribution algorithm.
>  # Introducing a data volume-based data balanced distribution algorithm for 
> Pointwise inputs
>  # Introducing the ability to split data corresponding to the specific key to 
> optimise AllToAll's data volume-based data balancing distribution algorithm.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-36576) Improving amount-based data balancing distribution algorithm for DefaultVertexParallelismAndInputInfosDecider

Reply via email to