Hi Benchao, Do you have any other questions about this issue? Also, I would appreciate your thoughts on the proposal to introduce the new option 'table.exec.mini-batch.compact-changes-enabled'. I’m looking forward your feedback.
> 2024年1月12日 15:01,shuai xu <xushuai...@gmail.com> 写道: > > Suppose we currently have a job that joins two CDC sources after > de-duplicating them and the output is available for audit analysis, and the > user turns off the parameter > "table.exec.deduplicate.mini-batch.compact-changes-enabled" to ensure that it > does not lose update details. If we don't introduce this parameter, after the > user upgrades the version, some update details may be lost due to the > mini-batch connection being enabled by default, resulting in distorted audit > results. > >> 2024年1月11日 16:19,Benchao Li <libenc...@apache.org> 写道: >> >>> the change might not be supposed for the downstream of the job which >>> requires details of changelog >> >> Could you elaborate on this a bit? I've never met such kinds of >> requirements before, I'm curious what is the scenario that requires >> this. >> >> shuai xu <xushuai...@gmail.com> 于2024年1月11日周四 13:08写道: >>> >>> Thanks for your response, Benchao. >>> >>> Here is my thought on the newly added option. >>> Users' current jobs are running on a version without minibatch join. If the >>> existing option to enable minibatch join is utilized, then when users' jobs >>> are migrated to the new version, the internal behavior of the join >>> operation within the jobs will change. Although the semantic of changelog >>> emitted by the Join operator is eventual consistency, the change might not >>> be supposed for the downstream of the job which requires details of >>> changelog. This newly added option also refers to >>> 'table.exec.deduplicate.mini-batch.compact-changes-enabled'. >>> >>> As for the implementation,The new operator shares the state of the original >>> operator and it merely has an additional minibatch for storing records to >>> do some optimization. The storage remains consistent, and there is minor >>> modification to the computational logic. >>> >>> Best, >>> Xu Shuai >>> >>>> 2024年1月10日 22:56,Benchao Li <libenc...@apache.org> 写道: >>>> >>>> Thanks shuai for driving this, mini-batch Join is a very useful >>>> optimization, +1 for the general idea. >>>> >>>> Regarding the configuration >>>> "table.exec.stream.join.mini-batch-enabled", I'm not sure it's really >>>> necessary. The semantic of changelog emitted by the Join operator is >>>> eventual consistency, so there is no much difference between original >>>> Join and mini-batch Join from this aspect. Besides, introducing more >>>> options would make it more complex for users, harder to understand and >>>> maintain, which we should be careful about. >>>> >>>> One thing about the implementation, could you make the new operator >>>> share the same state definition with the original one? >>>> >>>> shuai xu <xushuai...@gmail.com> 于2024年1月10日周三 21:23写道: >>>>> >>>>> Hi devs, >>>>> >>>>> I’d like to start a discussion on FLIP-415: Introduce a new join operator >>>>> to support minibatch[1]. >>>>> >>>>> Currently, when performing cascading connections in Flink, there is a >>>>> pain point of record amplification. Every record join operator receives >>>>> would trigger join process. However, if records of +I and -D matches , >>>>> they could be folded to reduce two times of join process. Besides, >>>>> records of -U +U might output 4 records in which two records are >>>>> redundant when encountering outer join . >>>>> >>>>> To address this issue, this FLIP introduces a new >>>>> MiniBatchStreamingJoinOperator to achieve batch processing which could >>>>> reduce number of outputting redundant messages and avoid unnecessary join >>>>> processes. >>>>> A new option is added to control the operator to avoid influencing >>>>> existing jobs. >>>>> >>>>> Please find more details in the FLIP wiki document [1]. Looking >>>>> forward to your feedback. >>>>> >>>>> [1] >>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-415%3A+Introduce+a+new+join+operator+to+support+minibatch >>>>> >>>>> Best, >>>>> Xu Shuai >>>> >>>> >>>> >>>> -- >>>> >>>> Best, >>>> Benchao Li >>> >> >> >> -- >> >> Best, >> Benchao Li > > Best, > Xu Shuai Best, Xu Shuai