Re: [DISCUSS] FLIP-470: Support Adaptive Broadcast Join

Zhu Zhu Fri, 19 Jul 2024 02:40:58 -0700

+1 for the FLIP

It's a good start to adaptively optimize the logical execution plan with
runtime information.


Thanks,
Zhu

Xia Sun <[email protected]> 于2024年7月18日周四 18:23写道：

> Hi devs,
>
> Junrui Lee, Lei Yang, and I would like to initiate a discussion about
> FLIP-470: Support Adaptive Broadcast Join[1].
>
> In general, Broadcast Hash Join is currently the most efficient join
> strategy available in Flink. However, its prerequisite is that the input
> data on one side must be sufficiently small; otherwise, it may lead to
> memory overuse or other issues. Currently, due to the lack of precise
> statistics, it is difficult to make accurate estimations during the Flink
> SQL Planning phase. For example, when an upstream Filter operator is
> present, it is easy to overestimate the size of the table, whereas with
> an expansion operator, the table size tends to be underestimated. Moreover,
> once the join operator is determined, it cannot be modified at runtime.
>
> To address this issue, we plan to introduce Adaptive Broadcast Join
> capability based on FLIP-468: Introducing StreamGraph-Based Job
> Submission[2]
> and FLIP-469: Supports Adaptive Optimization of StreamGraph[3]. This will
> allow the join operator to be dynamically optimized to Broadcast Join based
> on the actual input data volume at runtime and fallback when the
> optimization
> conditions are not met.
>
> For more details, please refer to FLIP-470[1]. We look forward to your
> feedback.
>
> Best,
> Junrui Lee, Lei Yang and Xia Sun
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-470%3A+Support+Adaptive+Broadcast+Join
> [2]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-468%3A+Introducing+StreamGraph-Based+Job+Submission
> [3]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-469%3A+Supports+Adaptive+Optimization+of+StreamGraph
>

Re: [DISCUSS] FLIP-470: Support Adaptive Broadcast Join

Reply via email to