Hi all,
Thank you for the valuable input.
Based on the current discussion, the minibatch join is prepared to follow the
existing three options of 'table.exec.mini-batch.enabled’,
'table.exec.mini-batch.allow-latency' and 'table.exec.mini-batch.size’. As for
the compaction within the minibatch
Hi Benchao,
I think your suggestion is very reasonable. For most users, having compaction
enabled by default if mini-batch enabled is a more beneficial approach.
However, I think this is an another thing we could discuss in the future about
compaction within minibatch, which is an orthogonal to
shuai,
Thanks for the explanations, I understand the scenario you described
now. IIUC, this will be a rather rare case that need to disable
"compaction" when mini-batch is enabled, so I won't be against
introducing it. However, I would suggest to enable the "compaction" by
default (if mini-batch e
Hi Benchao,
Do you have any other questions about this issue? Also, I would appreciate
your thoughts on the proposal to introduce the new option
'table.exec.mini-batch.compact-changes-enabled'. I’m looking forward your
feedback.
> 2024年1月12日 15:01,shuai xu 写道:
>
> Suppose we currently have
Hi shuai,
Thanks for your clarification.
The internal behavior of minibatch processing is not well-defined now.
I think you're right on this point. If you change the goal of the newly
introduced configuration to address this issue, then I'm ok with it.
Best,
Jane
On Mon, Jan 15, 2024 at 2:27
Hi all.
The point I want to highlight is that minibatch join could potentially yield
incomplete changelog which existing jobs are not supposed to be. For example,
the scenario that joins two CDC sources after de-duplicating them and the
output would be used for audit analysis could not accept
Hi shuai,
Thanks for the update! Regarding the newly introduced configuration, I hold
the same concern with Benchao and Xuyang.
First of all, in most cases, the fact that users choose to enable
mini-batch configuration indicates they are aware of the trade-off between
throughput and completeness
Hi all,
This is a relatively large optimization that may pose a significant
risk of bugs, so I like to keep it from being enabled by default for
now.
Best,
Jingsong
On Fri, Jan 12, 2024 at 3:01 PM shuai xu wrote:
>
> Suppose we currently have a job that joins two CDC sources after
> de-duplica
Suppose we currently have a job that joins two CDC sources after de-duplicating
them and the output is available for audit analysis, and the user turns off the
parameter "table.exec.deduplicate.mini-batch.compact-changes-enabled" to ensure
that it does not lose update details. If we don't introd
> the change might not be supposed for the downstream of the job which requires
> details of changelog
Could you elaborate on this a bit? I've never met such kinds of
requirements before, I'm curious what is the scenario that requires
this.
shuai xu 于2024年1月11日周四 13:08写道:
>
> Thanks for your re
Hi Jane,
Thanks for your reminder! I missed this.
I updated the FLIP with the UML of MiniBatchStreamingJoinOperator and linking
my POC implementation as reference.
They are placed in the part of Proposed Changes.
Best,
Xu Shuai
> 2024年1月11日 11:18,Jane Chan 写道:
>
> Hi shuai,
>
> Thanks
Thanks for your response, Benchao.
Here is my thought on the newly added option.
Users' current jobs are running on a version without minibatch join. If the
existing option to enable minibatch join is utilized, then when users' jobs are
migrated to the new version, the internal behavior of the j
Hi shuai,
Thanks for initiating the discussion. The mini-batch join optimization is
very helpful, particularly for optimizing outer join conditions in CDC
sources and handling cascade joins. And +1 for the proposal.
However, I don't see any details on the proposed
"MiniBatchStreamingJoinOperator"
Thanks shuai for driving this, mini-batch Join is a very useful
optimization, +1 for the general idea.
Regarding the configuration
"table.exec.stream.join.mini-batch-enabled", I'm not sure it's really
necessary. The semantic of changelog emitted by the Join operator is
eventual consistency, so the
14 matches
Mail list logo