Re: [DISCUSS] FLIP-367: Support Setting Parallelism for Table/SQL Sources

Martijn Visser Fri, 15 Sep 2023 05:51:36 -0700

Hi everyone,

Thanks for the FLIP and the discussion. I find it exciting. Thanks for
pushing for this.


Best regards,

Martijn

On Fri, Sep 15, 2023 at 2:25 PM Chen Zhanghao <[email protected]>
wrote:

> Hi Jane,
>
> Thanks for the valuable suggestions.
>
> For Q1, it's indeed an issue. Some possible ideas include introducing a
> fake transformation after the source that takes the global default
> parallelism, or simply make exec nodes to take the global default
> parallelism, but both ways prevent potential chaining opportunity and I'm
> not sure if that's good to go. We'll need to give deeper thoughts in it and
> polish our proposal. We're also more than glad to hear your inputs on it.
>
> For Q2, scan.parallelism will take high precedence, as the more specific
> config should take higher precedence.
>
> Best,
> Zhanghao Chen
> ________________________________
> 发件人: Jane Chan <[email protected]>
> 发送时间: 2023年9月15日 11:56
> 收件人: [email protected] <[email protected]>
> 抄送: [email protected] <[email protected]>
> 主题: Re: [DISCUSS] FLIP-367: Support Setting Parallelism for Table/SQL
> Sources
>
> Hi, Zhanghao, Dewei,
>
> Thanks for initiating this discussion. This feature is valuable in
> providing more flexibility for performance tuning for SQL pipelines.
>
> Here are my two cents,
>
> 1. In the FLIP, you mentioned concerns about the parallelism of the calc
> node and concluded to "leave the behavior unchanged for now."  This means
> that the calc node will use the parallelism of the source operator,
> regardless of whether the source parallelism is configured or not. If I
> understand correctly, currently, except for the sink exec node (which has
> the ability to configure its own parallelism), the rest of the exec nodes
> accept its input parallelism. From the design, I didn't see the details
> about coping with input and default parallelism for the rest of the exec
> nodes. Can you elaborate more about the details?
>
> 2. Does the configuration `table.exec.resource.default-parallelism` take
> precedence over `scan.parallelism`?
>
> Best,
> Jane
>
> On Fri, Sep 15, 2023 at 10:43 AM Yun Tang <[email protected]> wrote:
>
> > Thanks for creating this FLIP,
> >
> > Many users have demands to configure the source parallelism just as
> > configuring the sink parallelism via DDL. Look forward for this feature.
> >
> > BTW, I think setting parallelism for each operator should also be
> > valuable. And this shall work with compiled plan [1] instead of SQL's
> DDL.
> >
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-292%3A+Enhance+COMPILED+PLAN+to+support+operator-level+state+TTL+configuration
> >
> > Best
> > Yun Tang
> > ________________________________
> > From: Benchao Li <[email protected]>
> > Sent: Thursday, September 14, 2023 19:53
> > To: [email protected] <[email protected]>
> > Cc: [email protected] <[email protected]>
> > Subject: Re: [DISCUSS] FLIP-367: Support Setting Parallelism for
> Table/SQL
> > Sources
> >
> > Thanks Zhanghao, Dewei for preparing the FLIP,
> >
> > I think this is a long awaited feature, and I appreciate your effort,
> > especially the "Other concerns" part you listed.
> >
> > Regarding the parallelism of transformations following the source
> > transformation, it's indeed a problem that we initially want to solve
> > when we introduced this feature internally. I'd like to hear more
> > opinions on this. Personally I'm ok to leave it out of this FLIP for
> > the time being.
> >
> > Chen Zhanghao <[email protected]> 于2023年9月14日周四 14:46写道：
> > >
> > > Hi Devs,
> > >
> > > Dewei (cced) and I would like to start a discussion on FLIP-367:
> Support
> > Setting Parallelism for Table/SQL Sources [1].
> > >
> > > Currently, Flink Table/SQL jobs do not expose fine-grained control of
> > operator parallelism to users. FLIP-146 [2] brings us support for setting
> > parallelism for sinks, but except for that, one can only set a default
> > global parallelism and all other operators share the same parallelism.
> > However, in many cases, setting parallelism for sources individually is
> > preferable:
> > >
> > > - Many connectors have an upper bound parallelism to efficiently ingest
> > data. For example, the parallelism of a Kafka source is bound by the
> number
> > of partitions, any extra tasks would be idle.
> > > - Other operators may involve intensive computation and need a larger
> > parallelism.
> > >
> > > We propose to improve the current situation by extending the current
> > table source API to support setting parallelism for Table/SQL sources via
> > connector options.
> > >
> > > Looking forward to your feedback.
> > >
> > > [1] FLIP-367: Support Setting Parallelism for Table/SQL Sources -
> Apache
> > Flink - Apache Software Foundation<
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263429150
> > >
> > > [2] FLIP-146: Improve new TableSource and TableSink interfaces - Apache
> > Flink - Apache Software Foundation<
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-146%3A+Improve+new+TableSource+and+TableSink+interfaces
> > >
> > >
> > > Best,
> > > Zhanghao Chen
> >
> >
> >
> > --
> >
> > Best,
> > Benchao Li
> >
>

Re: [DISCUSS] FLIP-367: Support Setting Parallelism for Table/SQL Sources

Reply via email to