Hi, Zhanghao, Dewei,

Thanks for initiating this discussion. This feature is valuable in
providing more flexibility for performance tuning for SQL pipelines.

Here are my two cents,

1. In the FLIP, you mentioned concerns about the parallelism of the calc
node and concluded to "leave the behavior unchanged for now."  This means
that the calc node will use the parallelism of the source operator,
regardless of whether the source parallelism is configured or not. If I
understand correctly, currently, except for the sink exec node (which has
the ability to configure its own parallelism), the rest of the exec nodes
accept its input parallelism. From the design, I didn't see the details
about coping with input and default parallelism for the rest of the exec
nodes. Can you elaborate more about the details?

2. Does the configuration `table.exec.resource.default-parallelism` take
precedence over `scan.parallelism`?

Best,
Jane

On Fri, Sep 15, 2023 at 10:43 AM Yun Tang <myas...@live.com> wrote:

> Thanks for creating this FLIP,
>
> Many users have demands to configure the source parallelism just as
> configuring the sink parallelism via DDL. Look forward for this feature.
>
> BTW, I think setting parallelism for each operator should also be
> valuable. And this shall work with compiled plan [1] instead of SQL's DDL.
>
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-292%3A+Enhance+COMPILED+PLAN+to+support+operator-level+state+TTL+configuration
>
> Best
> Yun Tang
> ________________________________
> From: Benchao Li <libenc...@apache.org>
> Sent: Thursday, September 14, 2023 19:53
> To: dev@flink.apache.org <dev@flink.apache.org>
> Cc: dewe...@outlook.com <dewe...@outlook.com>
> Subject: Re: [DISCUSS] FLIP-367: Support Setting Parallelism for Table/SQL
> Sources
>
> Thanks Zhanghao, Dewei for preparing the FLIP,
>
> I think this is a long awaited feature, and I appreciate your effort,
> especially the "Other concerns" part you listed.
>
> Regarding the parallelism of transformations following the source
> transformation, it's indeed a problem that we initially want to solve
> when we introduced this feature internally. I'd like to hear more
> opinions on this. Personally I'm ok to leave it out of this FLIP for
> the time being.
>
> Chen Zhanghao <zhanghao.c...@outlook.com> 于2023年9月14日周四 14:46写道:
> >
> > Hi Devs,
> >
> > Dewei (cced) and I would like to start a discussion on FLIP-367: Support
> Setting Parallelism for Table/SQL Sources [1].
> >
> > Currently, Flink Table/SQL jobs do not expose fine-grained control of
> operator parallelism to users. FLIP-146 [2] brings us support for setting
> parallelism for sinks, but except for that, one can only set a default
> global parallelism and all other operators share the same parallelism.
> However, in many cases, setting parallelism for sources individually is
> preferable:
> >
> > - Many connectors have an upper bound parallelism to efficiently ingest
> data. For example, the parallelism of a Kafka source is bound by the number
> of partitions, any extra tasks would be idle.
> > - Other operators may involve intensive computation and need a larger
> parallelism.
> >
> > We propose to improve the current situation by extending the current
> table source API to support setting parallelism for Table/SQL sources via
> connector options.
> >
> > Looking forward to your feedback.
> >
> > [1] FLIP-367: Support Setting Parallelism for Table/SQL Sources - Apache
> Flink - Apache Software Foundation<
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263429150
> >
> > [2] FLIP-146: Improve new TableSource and TableSink interfaces - Apache
> Flink - Apache Software Foundation<
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-146%3A+Improve+new+TableSource+and+TableSink+interfaces
> >
> >
> > Best,
> > Zhanghao Chen
>
>
>
> --
>
> Best,
> Benchao Li
>

Reply via email to