+ 1 Thanks for the FLIP and the discussion. I would like to ask whether to use SQL Hint syntax to set this parallelism?
Martijn Visser <martijnvis...@apache.org> 于2023年9月15日周五 20:52写道: > Hi everyone, > > Thanks for the FLIP and the discussion. I find it exciting. Thanks for > pushing for this. > > Best regards, > > Martijn > > On Fri, Sep 15, 2023 at 2:25 PM Chen Zhanghao <zhanghao.c...@outlook.com> > wrote: > > > Hi Jane, > > > > Thanks for the valuable suggestions. > > > > For Q1, it's indeed an issue. Some possible ideas include introducing a > > fake transformation after the source that takes the global default > > parallelism, or simply make exec nodes to take the global default > > parallelism, but both ways prevent potential chaining opportunity and I'm > > not sure if that's good to go. We'll need to give deeper thoughts in it > and > > polish our proposal. We're also more than glad to hear your inputs on it. > > > > For Q2, scan.parallelism will take high precedence, as the more specific > > config should take higher precedence. > > > > Best, > > Zhanghao Chen > > ________________________________ > > 发件人: Jane Chan <qingyue....@gmail.com> > > 发送时间: 2023年9月15日 11:56 > > 收件人: dev@flink.apache.org <dev@flink.apache.org> > > 抄送: dewe...@outlook.com <dewe...@outlook.com> > > 主题: Re: [DISCUSS] FLIP-367: Support Setting Parallelism for Table/SQL > > Sources > > > > Hi, Zhanghao, Dewei, > > > > Thanks for initiating this discussion. This feature is valuable in > > providing more flexibility for performance tuning for SQL pipelines. > > > > Here are my two cents, > > > > 1. In the FLIP, you mentioned concerns about the parallelism of the calc > > node and concluded to "leave the behavior unchanged for now." This means > > that the calc node will use the parallelism of the source operator, > > regardless of whether the source parallelism is configured or not. If I > > understand correctly, currently, except for the sink exec node (which has > > the ability to configure its own parallelism), the rest of the exec nodes > > accept its input parallelism. From the design, I didn't see the details > > about coping with input and default parallelism for the rest of the exec > > nodes. Can you elaborate more about the details? > > > > 2. Does the configuration `table.exec.resource.default-parallelism` take > > precedence over `scan.parallelism`? > > > > Best, > > Jane > > > > On Fri, Sep 15, 2023 at 10:43 AM Yun Tang <myas...@live.com> wrote: > > > > > Thanks for creating this FLIP, > > > > > > Many users have demands to configure the source parallelism just as > > > configuring the sink parallelism via DDL. Look forward for this > feature. > > > > > > BTW, I think setting parallelism for each operator should also be > > > valuable. And this shall work with compiled plan [1] instead of SQL's > > DDL. > > > > > > > > > [1] > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-292%3A+Enhance+COMPILED+PLAN+to+support+operator-level+state+TTL+configuration > > > > > > Best > > > Yun Tang > > > ________________________________ > > > From: Benchao Li <libenc...@apache.org> > > > Sent: Thursday, September 14, 2023 19:53 > > > To: dev@flink.apache.org <dev@flink.apache.org> > > > Cc: dewe...@outlook.com <dewe...@outlook.com> > > > Subject: Re: [DISCUSS] FLIP-367: Support Setting Parallelism for > > Table/SQL > > > Sources > > > > > > Thanks Zhanghao, Dewei for preparing the FLIP, > > > > > > I think this is a long awaited feature, and I appreciate your effort, > > > especially the "Other concerns" part you listed. > > > > > > Regarding the parallelism of transformations following the source > > > transformation, it's indeed a problem that we initially want to solve > > > when we introduced this feature internally. I'd like to hear more > > > opinions on this. Personally I'm ok to leave it out of this FLIP for > > > the time being. > > > > > > Chen Zhanghao <zhanghao.c...@outlook.com> 于2023年9月14日周四 14:46写道: > > > > > > > > Hi Devs, > > > > > > > > Dewei (cced) and I would like to start a discussion on FLIP-367: > > Support > > > Setting Parallelism for Table/SQL Sources [1]. > > > > > > > > Currently, Flink Table/SQL jobs do not expose fine-grained control of > > > operator parallelism to users. FLIP-146 [2] brings us support for > setting > > > parallelism for sinks, but except for that, one can only set a default > > > global parallelism and all other operators share the same parallelism. > > > However, in many cases, setting parallelism for sources individually is > > > preferable: > > > > > > > > - Many connectors have an upper bound parallelism to efficiently > ingest > > > data. For example, the parallelism of a Kafka source is bound by the > > number > > > of partitions, any extra tasks would be idle. > > > > - Other operators may involve intensive computation and need a larger > > > parallelism. > > > > > > > > We propose to improve the current situation by extending the current > > > table source API to support setting parallelism for Table/SQL sources > via > > > connector options. > > > > > > > > Looking forward to your feedback. > > > > > > > > [1] FLIP-367: Support Setting Parallelism for Table/SQL Sources - > > Apache > > > Flink - Apache Software Foundation< > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263429150 > > > > > > > > [2] FLIP-146: Improve new TableSource and TableSink interfaces - > Apache > > > Flink - Apache Software Foundation< > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-146%3A+Improve+new+TableSource+and+TableSink+interfaces > > > > > > > > > > > > Best, > > > > Zhanghao Chen > > > > > > > > > > > > -- > > > > > > Best, > > > Benchao Li > > > > > > -- Best ConradJam