Hi all,
Thank you for the valuable input.
Based on the current discussion, the minibatch join is prepared to follow the
existing three options of 'table.exec.mini-batch.enabled’,
'table.exec.mini-batch.allow-latency' and 'table.exec.mini-batch.size’. As for
the compaction within the minibatch
Hi Benchao,
I think your suggestion is very reasonable. For most users, having compaction
enabled by default if mini-batch enabled is a more beneficial approach.
However, I think this is an another thing we could discuss in the future about
compaction within minibatch, which is an orthogonal to
shuai,
Thanks for the explanations, I understand the scenario you described
now. IIUC, this will be a rather rare case that need to disable
"compaction" when mini-batch is enabled, so I won't be against
introducing it. However, I would suggest to enable the "compaction" by
default (if mini-batch e
Hi Benchao,
Do you have any other questions about this issue? Also, I would appreciate
your thoughts on the proposal to introduce the new option
'table.exec.mini-batch.compact-changes-enabled'. I’m looking forward your
feedback.
> 2024年1月12日 15:01,shuai xu 写道:
>
> Suppose we currently have
Hi shuai,
Thanks for your clarification.
The internal behavior of minibatch processing is not well-defined now.
I think you're right on this point. If you change the goal of the newly
introduced configuration to address this issue, then I'm ok with it.
Best,
Jane
On Mon, Jan 15, 2024 at 2:27
Hi, shuai.
Thanks for this explaination. This scenario sounds reasonable to me. I agree
that we need to split the behavior
in minibatch into two types of options: 1. Whether to open minibatch to save
batch data; 2. Whether to compress
the changelog data while saving the batch, and merge the data
Hi all.
The point I want to highlight is that minibatch join could potentially yield
incomplete changelog which existing jobs are not supposed to be. For example,
the scenario that joins two CDC sources after de-duplicating them and the
output would be used for audit analysis could not accept
Hi shuai,
Thanks for the update! Regarding the newly introduced configuration, I hold
the same concern with Benchao and Xuyang.
First of all, in most cases, the fact that users choose to enable
mini-batch configuration indicates they are aware of the trade-off between
throughput and completeness
Hi all,
This is a relatively large optimization that may pose a significant
risk of bugs, so I like to keep it from being enabled by default for
now.
Best,
Jingsong
On Fri, Jan 12, 2024 at 3:01 PM shuai xu wrote:
>
> Suppose we currently have a job that joins two CDC sources after
> de-duplica
Suppose we currently have a job that joins two CDC sources after de-duplicating
them and the output is available for audit analysis, and the user turns off the
parameter "table.exec.deduplicate.mini-batch.compact-changes-enabled" to ensure
that it does not lose update details. If we don't introd
Hi, Xu Shuai. Thanks for driving this flip.
The CDC message amplification of cascade join has always been a problem for
users. Judging from the
nexmark results, this optimization is very meaningful. I just have the same
doubts as Benchao, why can't we
use minibatch join as the default behavio
> the change might not be supposed for the downstream of the job which requires
> details of changelog
Could you elaborate on this a bit? I've never met such kinds of
requirements before, I'm curious what is the scenario that requires
this.
shuai xu 于2024年1月11日周四 13:08写道:
>
> Thanks for your re
Hi Jane,
Thanks for your reminder! I missed this.
I updated the FLIP with the UML of MiniBatchStreamingJoinOperator and linking
my POC implementation as reference.
They are placed in the part of Proposed Changes.
Best,
Xu Shuai
> 2024年1月11日 11:18,Jane Chan 写道:
>
> Hi shuai,
>
> Thanks
Thanks for your response, Benchao.
Here is my thought on the newly added option.
Users' current jobs are running on a version without minibatch join. If the
existing option to enable minibatch join is utilized, then when users' jobs are
migrated to the new version, the internal behavior of the j
Hi shuai,
Thanks for initiating the discussion. The mini-batch join optimization is
very helpful, particularly for optimizing outer join conditions in CDC
sources and handling cascade joins. And +1 for the proposal.
However, I don't see any details on the proposed
"MiniBatchStreamingJoinOperator"
Thanks shuai for driving this, mini-batch Join is a very useful
optimization, +1 for the general idea.
Regarding the configuration
"table.exec.stream.join.mini-batch-enabled", I'm not sure it's really
necessary. The semantic of changelog emitted by the Join operator is
eventual consistency, so the
Hi devs,
I’d like to start a discussion on FLIP-415: Introduce a new join operator to
support minibatch[1].
Currently, when performing cascading connections in Flink, there is a pain
point of record amplification. Every record join operator receives would
trigger join process. However, if reco
17 matches
Mail list logo