Hi everyone, Thanks for the FLIP and the discussion. I find it exciting. Thanks for pushing for this.
Best regards, Martijn On Fri, Sep 15, 2023 at 2:25 PM Chen Zhanghao <zhanghao.c...@outlook.com> wrote: > Hi Jane, > > Thanks for the valuable suggestions. > > For Q1, it's indeed an issue. Some possible ideas include introducing a > fake transformation after the source that takes the global default > parallelism, or simply make exec nodes to take the global default > parallelism, but both ways prevent potential chaining opportunity and I'm > not sure if that's good to go. We'll need to give deeper thoughts in it and > polish our proposal. We're also more than glad to hear your inputs on it. > > For Q2, scan.parallelism will take high precedence, as the more specific > config should take higher precedence. > > Best, > Zhanghao Chen > ________________________________ > 发件人: Jane Chan <qingyue....@gmail.com> > 发送时间: 2023年9月15日 11:56 > 收件人: dev@flink.apache.org <dev@flink.apache.org> > 抄送: dewe...@outlook.com <dewe...@outlook.com> > 主题: Re: [DISCUSS] FLIP-367: Support Setting Parallelism for Table/SQL > Sources > > Hi, Zhanghao, Dewei, > > Thanks for initiating this discussion. This feature is valuable in > providing more flexibility for performance tuning for SQL pipelines. > > Here are my two cents, > > 1. In the FLIP, you mentioned concerns about the parallelism of the calc > node and concluded to "leave the behavior unchanged for now." This means > that the calc node will use the parallelism of the source operator, > regardless of whether the source parallelism is configured or not. If I > understand correctly, currently, except for the sink exec node (which has > the ability to configure its own parallelism), the rest of the exec nodes > accept its input parallelism. From the design, I didn't see the details > about coping with input and default parallelism for the rest of the exec > nodes. Can you elaborate more about the details? > > 2. Does the configuration `table.exec.resource.default-parallelism` take > precedence over `scan.parallelism`? > > Best, > Jane > > On Fri, Sep 15, 2023 at 10:43 AM Yun Tang <myas...@live.com> wrote: > > > Thanks for creating this FLIP, > > > > Many users have demands to configure the source parallelism just as > > configuring the sink parallelism via DDL. Look forward for this feature. > > > > BTW, I think setting parallelism for each operator should also be > > valuable. And this shall work with compiled plan [1] instead of SQL's > DDL. > > > > > > [1] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-292%3A+Enhance+COMPILED+PLAN+to+support+operator-level+state+TTL+configuration > > > > Best > > Yun Tang > > ________________________________ > > From: Benchao Li <libenc...@apache.org> > > Sent: Thursday, September 14, 2023 19:53 > > To: dev@flink.apache.org <dev@flink.apache.org> > > Cc: dewe...@outlook.com <dewe...@outlook.com> > > Subject: Re: [DISCUSS] FLIP-367: Support Setting Parallelism for > Table/SQL > > Sources > > > > Thanks Zhanghao, Dewei for preparing the FLIP, > > > > I think this is a long awaited feature, and I appreciate your effort, > > especially the "Other concerns" part you listed. > > > > Regarding the parallelism of transformations following the source > > transformation, it's indeed a problem that we initially want to solve > > when we introduced this feature internally. I'd like to hear more > > opinions on this. Personally I'm ok to leave it out of this FLIP for > > the time being. > > > > Chen Zhanghao <zhanghao.c...@outlook.com> 于2023年9月14日周四 14:46写道: > > > > > > Hi Devs, > > > > > > Dewei (cced) and I would like to start a discussion on FLIP-367: > Support > > Setting Parallelism for Table/SQL Sources [1]. > > > > > > Currently, Flink Table/SQL jobs do not expose fine-grained control of > > operator parallelism to users. FLIP-146 [2] brings us support for setting > > parallelism for sinks, but except for that, one can only set a default > > global parallelism and all other operators share the same parallelism. > > However, in many cases, setting parallelism for sources individually is > > preferable: > > > > > > - Many connectors have an upper bound parallelism to efficiently ingest > > data. For example, the parallelism of a Kafka source is bound by the > number > > of partitions, any extra tasks would be idle. > > > - Other operators may involve intensive computation and need a larger > > parallelism. > > > > > > We propose to improve the current situation by extending the current > > table source API to support setting parallelism for Table/SQL sources via > > connector options. > > > > > > Looking forward to your feedback. > > > > > > [1] FLIP-367: Support Setting Parallelism for Table/SQL Sources - > Apache > > Flink - Apache Software Foundation< > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263429150 > > > > > > [2] FLIP-146: Improve new TableSource and TableSink interfaces - Apache > > Flink - Apache Software Foundation< > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-146%3A+Improve+new+TableSource+and+TableSink+interfaces > > > > > > > > > Best, > > > Zhanghao Chen > > > > > > > > -- > > > > Best, > > Benchao Li > > >