Hi shuai,

Thanks for initiating the discussion. The mini-batch join optimization is
very helpful, particularly for optimizing outer join conditions in CDC
sources and handling cascade joins. And +1 for the proposal.

However, I don't see any details on the proposed
"MiniBatchStreamingJoinOperator",  would you mind elaborating more about it?

Best,
Jane


On Wed, Jan 10, 2024 at 10:56 PM Benchao Li <libenc...@apache.org> wrote:

> Thanks shuai for driving this, mini-batch Join is a very useful
> optimization, +1 for the general idea.
>
> Regarding the configuration
> "table.exec.stream.join.mini-batch-enabled", I'm not sure it's really
> necessary. The semantic of changelog emitted by the Join operator is
> eventual consistency, so there is no much difference between original
> Join and mini-batch Join from this aspect. Besides, introducing more
> options would make it more complex for users, harder to understand and
> maintain, which we should be careful about.
>
> One thing about the implementation, could you make the new operator
> share the same state definition with the original one?
>
> shuai xu <xushuai...@gmail.com> 于2024年1月10日周三 21:23写道:
> >
> > Hi devs,
> >
> > I’d like to start a discussion on FLIP-415: Introduce a new join
> operator to support minibatch[1].
> >
> > Currently, when performing cascading connections in Flink, there is a
> pain point of record amplification. Every record join operator receives
> would trigger join process. However, if records of +I and -D matches , they
> could be folded to reduce two times of join process. Besides, records of
> -U +U might output 4 records in which two records are redundant when
> encountering outer join .
> >
> > To address this issue, this FLIP introduces a new
> MiniBatchStreamingJoinOperator to achieve batch processing which could
> reduce number of outputting redundant messages and avoid unnecessary join
> processes.
> > A new option is added to control the operator to avoid influencing
> existing jobs.
> >
> > Please find more details in the FLIP wiki document [1]. Looking
> > forward to your feedback.
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-415%3A+Introduce+a+new+join+operator+to+support+minibatch
> >
> > Best,
> > Xu Shuai
>
>
>
> --
>
> Best,
> Benchao Li
>

Reply via email to