+1 for the FLIP It's a good start to adaptively optimize the logical execution plan with runtime information.
Thanks, Zhu Xia Sun <xingbe...@gmail.com> 于2024年7月18日周四 18:23写道: > Hi devs, > > Junrui Lee, Lei Yang, and I would like to initiate a discussion about > FLIP-470: Support Adaptive Broadcast Join[1]. > > In general, Broadcast Hash Join is currently the most efficient join > strategy available in Flink. However, its prerequisite is that the input > data on one side must be sufficiently small; otherwise, it may lead to > memory overuse or other issues. Currently, due to the lack of precise > statistics, it is difficult to make accurate estimations during the Flink > SQL Planning phase. For example, when an upstream Filter operator is > present, it is easy to overestimate the size of the table, whereas with > an expansion operator, the table size tends to be underestimated. Moreover, > once the join operator is determined, it cannot be modified at runtime. > > To address this issue, we plan to introduce Adaptive Broadcast Join > capability based on FLIP-468: Introducing StreamGraph-Based Job > Submission[2] > and FLIP-469: Supports Adaptive Optimization of StreamGraph[3]. This will > allow the join operator to be dynamically optimized to Broadcast Join based > on the actual input data volume at runtime and fallback when the > optimization > conditions are not met. > > For more details, please refer to FLIP-470[1]. We look forward to your > feedback. > > Best, > Junrui Lee, Lei Yang and Xia Sun > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-470%3A+Support+Adaptive+Broadcast+Join > [2] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-468%3A+Introducing+StreamGraph-Based+Job+Submission > [3] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-469%3A+Supports+Adaptive+Optimization+of+StreamGraph >