Hi devs, I’d like to start a discussion on FLIP-415: Introduce a new join operator to support minibatch[1].
Currently, when performing cascading connections in Flink, there is a pain point of record amplification. Every record join operator receives would trigger join process. However, if records of +I and -D matches , they could be folded to reduce two times of join process. Besides, records of -U +U might output 4 records in which two records are redundant when encountering outer join . To address this issue, this FLIP introduces a new MiniBatchStreamingJoinOperator to achieve batch processing which could reduce number of outputting redundant messages and avoid unnecessary join processes. A new option is added to control the operator to avoid influencing existing jobs. Please find more details in the FLIP wiki document [1]. Looking forward to your feedback. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-415%3A+Introduce+a+new+join+operator+to+support+minibatch Best, Xu Shuai
