Hi Jane, Thanks for your reminder! I missed this.
I updated the FLIP with the UML of MiniBatchStreamingJoinOperator and linking my POC implementation as reference. They are placed in the part of Proposed Changes. Best, Xu Shuai > 2024年1月11日 11:18,Jane Chan <qingyue....@gmail.com> 写道: > > Hi shuai, > > Thanks for initiating the discussion. The mini-batch join optimization is > very helpful, particularly for optimizing outer join conditions in CDC > sources and handling cascade joins. And +1 for the proposal. > > However, I don't see any details on the proposed > "MiniBatchStreamingJoinOperator", would you mind elaborating more about it? > > Best, > Jane > > > On Wed, Jan 10, 2024 at 10:56 PM Benchao Li <libenc...@apache.org> wrote: > >> Thanks shuai for driving this, mini-batch Join is a very useful >> optimization, +1 for the general idea. >> >> Regarding the configuration >> "table.exec.stream.join.mini-batch-enabled", I'm not sure it's really >> necessary. The semantic of changelog emitted by the Join operator is >> eventual consistency, so there is no much difference between original >> Join and mini-batch Join from this aspect. Besides, introducing more >> options would make it more complex for users, harder to understand and >> maintain, which we should be careful about. >> >> One thing about the implementation, could you make the new operator >> share the same state definition with the original one? >> >> shuai xu <xushuai...@gmail.com> 于2024年1月10日周三 21:23写道: >>> >>> Hi devs, >>> >>> I’d like to start a discussion on FLIP-415: Introduce a new join >> operator to support minibatch[1]. >>> >>> Currently, when performing cascading connections in Flink, there is a >> pain point of record amplification. Every record join operator receives >> would trigger join process. However, if records of +I and -D matches , they >> could be folded to reduce two times of join process. Besides, records of >> -U +U might output 4 records in which two records are redundant when >> encountering outer join . >>> >>> To address this issue, this FLIP introduces a new >> MiniBatchStreamingJoinOperator to achieve batch processing which could >> reduce number of outputting redundant messages and avoid unnecessary join >> processes. >>> A new option is added to control the operator to avoid influencing >> existing jobs. >>> >>> Please find more details in the FLIP wiki document [1]. Looking >>> forward to your feedback. >>> >>> [1] >>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-415%3A+Introduce+a+new+join+operator+to+support+minibatch >>> >>> Best, >>> Xu Shuai >> >> >> >> -- >> >> Best, >> Benchao Li >>