Re: [DISCUSS] FLIP-379: Dynamic source parallelism inference for batch jobs

Zhu Zhu Tue, 31 Oct 2023 05:19:38 -0700

Thanks for opening the FLIP and kicking off this discussion, Xia!
The proposed changes make up an important missing part of the dynamic
parallelism inference of adaptive batch scheduler.


Besides that, it is also one good step towards supporting dynamic
parallelism inference for streaming sources, e.g. allowing Kafka
sources to determine its parallelism automatically based on the
number of partitions.

+1 for the proposal.

Thanks,
Zhu

Xia Sun <xingbe...@gmail.com> 于2023年10月31日周二 16:01写道：

> Hi everyone,
> I would like to start a discussion on FLIP-379: Dynamic source parallelism
> inference for batch jobs[1].
>
> In general, there are three main ways to set source parallelism for batch
> jobs:
> (1) User-defined source parallelism.
> (2) Connector static parallelism inference.
> (3) Dynamic parallelism inference.
>
> Compared to manually setting parallelism, automatic parallelism inference
> is easier to use and can better adapt to varying data volumes each day.
> However, static parallelism inference cannot leverage runtime information,
> resulting in inaccurate parallelism inference. Therefore, for batch jobs,
> dynamic parallelism inference is the most ideal, but currently, the support
> for adaptive batch scheduler is not very comprehensive.
>
> Therefore, we aim to introduce a general interface that enables the
> adaptive batch scheduler to dynamically infer the source parallelism at
> runtime. Please refer to the FLIP[1] document for more details about the
> proposed design and implementation.
>
> I also thank Zhu Zhu and LiJie Wang for their suggestions during the
> pre-discussion.
> Looking forward to your feedback and suggestions, thanks.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs
>
> Best regards,
> Xia
>

Re: [DISCUSS] FLIP-379: Dynamic source parallelism inference for batch jobs

Reply via email to