Re: [DISCUSS] FLIP-415: Introduce a new join operator to support minibatch

shuai xu Wed, 17 Jan 2024 19:08:32 -0800

Hi all,

Thank you for the valuable input.


Based on the current discussion, the minibatch join is prepared to follow the 
existing three options of 'table.exec.mini-batch.enabled’, 
'table.exec.mini-batch.allow-latency' and 'table.exec.mini-batch.size’. As for 
the compaction within the minibatch that was mentioned in the discussion, it 
could be discussed in a future FLIP.

Do any of you have further questions regarding this FLIP? If there are no more 
comments, I would like to open a voting thread at 12 a.m. UTC+8 on January 
19th. 
> 2024年1月10日 21:23，shuai xu <xushuai...@gmail.com> 写道：
> 
> Hi devs,
> 
> I’d like to start a discussion on FLIP-415: Introduce a new join operator to 
> support minibatch[1].
> 
> Currently, when performing cascading connections in Flink, there is a pain 
> point of record amplification. Every record join operator receives would 
> trigger join process. However, if records of +I and -D matches , they could 
> be folded to reduce two times of join process. Besides, records of  -U +U 
> might output 4 records in which two records are redundant when encountering 
> outer join . 
> 
> To address this issue, this FLIP introduces a new  
> MiniBatchStreamingJoinOperator to achieve batch processing which could reduce 
> number of outputting redundant messages and avoid unnecessary join processes. 
> A new option is added to control the operator to avoid influencing existing 
> jobs.
> 
> Please find more details in the FLIP wiki document [1]. Looking
> forward to your feedback.
> 
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-415%3A+Introduce+a+new+join+operator+to+support+minibatch
> 
> Best,
> Xu Shuai

Best,
Xu Shuai

Re: [DISCUSS] FLIP-415: Introduce a new join operator to support minibatch

Reply via email to