Hi Jane,

Thanks for your reminder!  I missed this.

I updated the FLIP with the UML of MiniBatchStreamingJoinOperator and linking 
my POC implementation as reference. 
They are placed in the part of Proposed Changes. 

Best,
Xu Shuai



> 2024年1月11日 11:18,Jane Chan <qingyue....@gmail.com> 写道:
> 
> Hi shuai,
> 
> Thanks for initiating the discussion. The mini-batch join optimization is
> very helpful, particularly for optimizing outer join conditions in CDC
> sources and handling cascade joins. And +1 for the proposal.
> 
> However, I don't see any details on the proposed
> "MiniBatchStreamingJoinOperator",  would you mind elaborating more about it?
> 
> Best,
> Jane
> 
> 
> On Wed, Jan 10, 2024 at 10:56 PM Benchao Li <libenc...@apache.org> wrote:
> 
>> Thanks shuai for driving this, mini-batch Join is a very useful
>> optimization, +1 for the general idea.
>> 
>> Regarding the configuration
>> "table.exec.stream.join.mini-batch-enabled", I'm not sure it's really
>> necessary. The semantic of changelog emitted by the Join operator is
>> eventual consistency, so there is no much difference between original
>> Join and mini-batch Join from this aspect. Besides, introducing more
>> options would make it more complex for users, harder to understand and
>> maintain, which we should be careful about.
>> 
>> One thing about the implementation, could you make the new operator
>> share the same state definition with the original one?
>> 
>> shuai xu <xushuai...@gmail.com> 于2024年1月10日周三 21:23写道:
>>> 
>>> Hi devs,
>>> 
>>> I’d like to start a discussion on FLIP-415: Introduce a new join
>> operator to support minibatch[1].
>>> 
>>> Currently, when performing cascading connections in Flink, there is a
>> pain point of record amplification. Every record join operator receives
>> would trigger join process. However, if records of +I and -D matches , they
>> could be folded to reduce two times of join process. Besides, records of
>> -U +U might output 4 records in which two records are redundant when
>> encountering outer join .
>>> 
>>> To address this issue, this FLIP introduces a new
>> MiniBatchStreamingJoinOperator to achieve batch processing which could
>> reduce number of outputting redundant messages and avoid unnecessary join
>> processes.
>>> A new option is added to control the operator to avoid influencing
>> existing jobs.
>>> 
>>> Please find more details in the FLIP wiki document [1]. Looking
>>> forward to your feedback.
>>> 
>>> [1]
>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-415%3A+Introduce+a+new+join+operator+to+support+minibatch
>>> 
>>> Best,
>>> Xu Shuai
>> 
>> 
>> 
>> --
>> 
>> Best,
>> Benchao Li
>> 

Reply via email to