Hi Xuyang,

Thanks for driving this! The state of two-stream join is a very
headache-inducing problem in stream computing system.

After reading this FLIP, I have two question:

1. Can the two sides of the join operator be any stream or must it be
immediately followed by the LookupSource? If it is any stream, how should
the stateful operators in the path be handled?
2. Are there any plans to support strong consistency in the future? This
should be helpful for the scenarios of incremental computing.



Best regards,

Weijie


Xuyang <xyzhong...@163.com> 于2025年4月25日周五 10:52写道:

> Hi, devs.
>
> I'd like to start a discussion on FLIP-486: Introduce a new DeltaJoin[1].
>
>
>
>
> In Flink streaming jobs, the large state of Join nodes has been a
> persistent concern for users.
>
> Since streaming jobs are long-running, the state of join generally
> increases in size over time.
>
> Although users can set the state TTL to mitigate this issue, it is not
> applicable to all scenarios
>
> and does not provide a fundamental solution.
>
> An oversized state can lead to a series of problems, including but not
> limited to:
>
> 1. Resource bottlenecks in individual tasks
>
> 2. Slow checkpointing, which affects job stability during the
> checkpointing process
>
> 3. Long recovery time from state
>
> In addition, when analyzing the join state, we find that in some
> scenarios, the state within the join
>
> actually contains redundant data from the source tables.
>
>
>
>
> To address this issue, we aim to introduce Delta Join, which is based on
> the core idea of leveraging
>
> a bidirectional lookup join approach to reuse source table data as a
> substitute for join state.
>
>
>
>
> You can find more details in this Flip. I'm looking forward to your
> comments and feedback.
>
>
>
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-486%3A+Introduce+A+New+DeltaJoin
>
>
>
>
> --
>
>     Best!
>     Xuyang

Reply via email to