+1 for the FLIP Long-tail tasks caused by skewed data usually pose significant challenges for users. It's great that Flink can mitigate such issues automatically.
Thanks, Zhu Lei Yang <leya5...@gmail.com> 于2024年8月16日周五 11:18写道: > Hi devs, > > > Junrui Lee, Xia Sun and I would like to initiate a discussion about > FLIP-475: Support Adaptive Skewed Join Optimization [1]. > > > In a Join query, when certain keys occur frequently, it can lead to an > uneven distribution of data across partitions. This may affect the > execution performance of Flink jobs, as a single partition with skewed data > can severely downgrade the performance of the entire job. To ensure data is > evenly distributed to downstream tasks, we can use the statistics of the > input to split (and duplicate if needed) skewed and splittable partitions > into balanced partitions at runtime. However, currently, Flink is unable to > accurately determine which partitions are skewed and eligible for splitting > at runtime, and it also lacks the capability to split data within the same > key group. > > > To address this issue, we plan to introduce Adaptive Skewed Join > Optimization capability. This will allow the Join operator to dynamically > split partitions that are skewed and splittable based on the statistics of > the input at runtime, reducing the long-tail problem caused by skewed data. > This FLIP is based on FLIP-469 [2] and also leverages capabilities > introduced in FLIP-470 [3]. > > > For more details, please refer to FLIP-475 [1]. We look forward to your > feedback. > > > Best, > > > Junrui Lee, Xia Sun and Lei Yang > > > [1] > > * > https://cwiki.apache.org/confluence/display/FLINK/FLIP-475%3A+Support+Adaptive+Skewed+Join+Optimization > < > https://cwiki.apache.org/confluence/display/FLINK/FLIP-475%3A+Support+Adaptive+Skewed+Join+Optimization > >* > > [2] > > * > https://cwiki.apache.org/confluence/display/FLINK/FLIP-469%3A+Supports+Adaptive+Optimization+of+StreamGraph > < > https://cwiki.apache.org/confluence/display/FLINK/FLIP-469%3A+Supports+Adaptive+Optimization+of+StreamGraph > >* > > [3] > > * > https://cwiki.apache.org/confluence/display/FLINK/FLIP-470%3A+Support+Adaptive+Broadcast+Join > < > https://cwiki.apache.org/confluence/display/FLINK/FLIP-470%3A+Support+Adaptive+Broadcast+Join > >* >