Hi shuai, Thanks for initiating the discussion. The mini-batch join optimization is very helpful, particularly for optimizing outer join conditions in CDC sources and handling cascade joins. And +1 for the proposal.
However, I don't see any details on the proposed "MiniBatchStreamingJoinOperator", would you mind elaborating more about it? Best, Jane On Wed, Jan 10, 2024 at 10:56 PM Benchao Li <libenc...@apache.org> wrote: > Thanks shuai for driving this, mini-batch Join is a very useful > optimization, +1 for the general idea. > > Regarding the configuration > "table.exec.stream.join.mini-batch-enabled", I'm not sure it's really > necessary. The semantic of changelog emitted by the Join operator is > eventual consistency, so there is no much difference between original > Join and mini-batch Join from this aspect. Besides, introducing more > options would make it more complex for users, harder to understand and > maintain, which we should be careful about. > > One thing about the implementation, could you make the new operator > share the same state definition with the original one? > > shuai xu <xushuai...@gmail.com> 于2024年1月10日周三 21:23写道: > > > > Hi devs, > > > > I’d like to start a discussion on FLIP-415: Introduce a new join > operator to support minibatch[1]. > > > > Currently, when performing cascading connections in Flink, there is a > pain point of record amplification. Every record join operator receives > would trigger join process. However, if records of +I and -D matches , they > could be folded to reduce two times of join process. Besides, records of > -U +U might output 4 records in which two records are redundant when > encountering outer join . > > > > To address this issue, this FLIP introduces a new > MiniBatchStreamingJoinOperator to achieve batch processing which could > reduce number of outputting redundant messages and avoid unnecessary join > processes. > > A new option is added to control the operator to avoid influencing > existing jobs. > > > > Please find more details in the FLIP wiki document [1]. Looking > > forward to your feedback. > > > > [1] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-415%3A+Introduce+a+new+join+operator+to+support+minibatch > > > > Best, > > Xu Shuai > > > > -- > > Best, > Benchao Li >