Hi devs,

Junrui Lee, Lei Yang, and I would like to initiate a discussion about
FLIP-470: Support Adaptive Broadcast Join[1].

In general, Broadcast Hash Join is currently the most efficient join
strategy available in Flink. However, its prerequisite is that the input
data on one side must be sufficiently small; otherwise, it may lead to
memory overuse or other issues. Currently, due to the lack of precise
statistics, it is difficult to make accurate estimations during the Flink
SQL Planning phase. For example, when an upstream Filter operator is
present, it is easy to overestimate the size of the table, whereas with
an expansion operator, the table size tends to be underestimated. Moreover,
once the join operator is determined, it cannot be modified at runtime.

To address this issue, we plan to introduce Adaptive Broadcast Join
capability based on FLIP-468: Introducing StreamGraph-Based Job
Submission[2]
and FLIP-469: Supports Adaptive Optimization of StreamGraph[3]. This will
allow the join operator to be dynamically optimized to Broadcast Join based
on the actual input data volume at runtime and fallback when the
optimization
conditions are not met.

For more details, please refer to FLIP-470[1]. We look forward to your
feedback.

Best,
Junrui Lee, Lei Yang and Xia Sun

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-470%3A+Support+Adaptive+Broadcast+Join
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-468%3A+Introducing+StreamGraph-Based+Job+Submission
[3]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-469%3A+Supports+Adaptive+Optimization+of+StreamGraph

Reply via email to