noorall opened a new pull request, #25552: URL: https://github.com/apache/flink/pull/25552
## What is the purpose of the change Currently, the DefaultVertexParallelismAndInputInfosDecider is able to implement a balanced distribution algorithm based on the amount of data and the number of subpartitions, however it also has some limitations: 1. Currently, Decider selects the data distribution algorithm via the AllToAll or Pointwise attribute of the input, which limits the ability of the operator to dynamically modify the data distribution algorithm. 2. Doesn't support data volume-based balanced distribution for Pointwise inputs. 3. For AllToAll type inputs, it does not support splitting the data corresponding to the specific key, i.e., it cannot solve the data skewing caused by single-key hotspot. For that we plan to introduce the following improvements: 1. Introducing InterInputsKeyCorrelation and IntraInputKeyCorrelation to the input characterisation which allows the operator to flexibly choose the data balanced distribution algorithm. 2. Introducing a data volume-based data balanced distribution algorithm for Pointwise inputs 3. Introducing the ability to split data corresponding to the specific key to optimise AllToAll's data volume-based data balancing distribution algorithm. ## Brief change log - Introducing InterInputsKeyCorrelation and IntraInputKeyCorrelation. - Introducing amount-based data balanced distribution algorithm for Pointwise. - Introducing the ability to split data corresponding to the specific key for AllToAll ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) - If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org