Hi devs,
Junrui Lee, Xia Sun and I would like to initiate a discussion about FLIP-475: Support Adaptive Skewed Join Optimization [1]. In a Join query, when certain keys occur frequently, it can lead to an uneven distribution of data across partitions. This may affect the execution performance of Flink jobs, as a single partition with skewed data can severely downgrade the performance of the entire job. To ensure data is evenly distributed to downstream tasks, we can use the statistics of the input to split (and duplicate if needed) skewed and splittable partitions into balanced partitions at runtime. However, currently, Flink is unable to accurately determine which partitions are skewed and eligible for splitting at runtime, and it also lacks the capability to split data within the same key group. To address this issue, we plan to introduce Adaptive Skewed Join Optimization capability. This will allow the Join operator to dynamically split partitions that are skewed and splittable based on the statistics of the input at runtime, reducing the long-tail problem caused by skewed data. This FLIP is based on FLIP-469 [2] and also leverages capabilities introduced in FLIP-470 [3]. For more details, please refer to FLIP-475 [1]. We look forward to your feedback. Best, Junrui Lee, Xia Sun and Lei Yang [1] *https://cwiki.apache.org/confluence/display/FLINK/FLIP-475%3A+Support+Adaptive+Skewed+Join+Optimization <https://cwiki.apache.org/confluence/display/FLINK/FLIP-475%3A+Support+Adaptive+Skewed+Join+Optimization>* [2] *https://cwiki.apache.org/confluence/display/FLINK/FLIP-469%3A+Supports+Adaptive+Optimization+of+StreamGraph <https://cwiki.apache.org/confluence/display/FLINK/FLIP-469%3A+Supports+Adaptive+Optimization+of+StreamGraph>* [3] *https://cwiki.apache.org/confluence/display/FLINK/FLIP-470%3A+Support+Adaptive+Broadcast+Join <https://cwiki.apache.org/confluence/display/FLINK/FLIP-470%3A+Support+Adaptive+Broadcast+Join>*