Re: [DISCUSS FLINKSQL PARALLELISM]

Jing Ge Mon, 24 Apr 2023 00:39:20 -0700

Hi Green,



Since FLIP-292 opened the door to do fine-grained tuning at operator level
for Flink SQL jobs, I would also suggest leveraging the compiled json to do
further config optimization like Yun Tang already mentioned. We should
consider making it(leveraging the compiled json plan) the stand process for
Flink SQL job fine-grained tuning.



Best regards,

Jing

On Wed, Apr 19, 2023 at 8:44 AM Yun Tang <[email protected]> wrote:

> I noticed that Yuxia had replied that "sink.paralleilsm" could help in
> some cases.
>
> I think a better way is to integrate it with streamGraph or extend
> CompiledPlan just as FLIP-292 setting state TTL per operator [1] does.
>
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240883951
>
> Best
> Yun Tang
> ________________________________
> From: GREEN <[email protected]>
> Sent: Tuesday, April 18, 2023 17:21
> To: dev <[email protected]>
> Subject: Re: [DISCUSS FLINKSQL PARALLELISM]
>
> During the process of generating streamgraph，I can modify the edge
> partitioner by configuring parameters.
> Just need to know in advance the structure of the streamgraph,This can be
> obtained by printing log.
>
>
>
> ---Original---
> From: "liu ron"<[email protected]&gt;
> Date: Tue, Apr 18, 2023 09:37 AM
> To: "dev"<[email protected]&gt;;
> Subject: Re: [DISCUSS FLINKSQL PARALLELISM]
>
>
> Hi, Green
>
> Thanks for driving this discussion, in batch mode we have the Adaptive
> Batch Scheduler which automatically derives operator parallelism based on
> data volume at runtime, so we don't need to care about the parallelism.
> However, in stream mode, currently, Flink SQL can only set the parallelism
> of an operator globally, and many users would like to set the parallelism
> of an operator individually, which seems to be a pain point at the moment,
> and it would make sense to support set parallelism at operator granularity.
> Do you have any idea about the solution for this problem?
>
> Best,
> Ron
>
>
> GREEN <[email protected]&gt; 于2023年4月14日周五 16:03写道：
>
> &gt; Problem：&nbsp;
> &gt;
> &gt;
> &gt; Currently, FlinkSQL can&nbsp; set a unified parallelism in the job,it
> &gt; cannot set parallelism for each operator.
> &gt; This can cause resource waste&nbsp; On the occasion of&nbsp; high
> &gt; parallelism and small data volume.there may also be too many small
> &gt; file&nbsp; for&nbsp; writing HDFS Scene.
> &gt;
> &gt;
> &gt; Solution：
> &gt; I can modify FlinkSQL to support operator parallelism.Is it
> meaningful to
> &gt; do this？Let's discuss.
>

Re: [DISCUSS FLINKSQL PARALLELISM]

Reply via email to