Hi Devs, Dewei (cced) and I would like to start a discussion on FLIP-367: Support Setting Parallelism for Table/SQL Sources [1].
Currently, Flink Table/SQL jobs do not expose fine-grained control of operator parallelism to users. FLIP-146 [2] brings us support for setting parallelism for sinks, but except for that, one can only set a default global parallelism and all other operators share the same parallelism. However, in many cases, setting parallelism for sources individually is preferable: - Many connectors have an upper bound parallelism to efficiently ingest data. For example, the parallelism of a Kafka source is bound by the number of partitions, any extra tasks would be idle. - Other operators may involve intensive computation and need a larger parallelism. We propose to improve the current situation by extending the current table source API to support setting parallelism for Table/SQL sources via connector options. Looking forward to your feedback. [1] FLIP-367: Support Setting Parallelism for Table/SQL Sources - Apache Flink - Apache Software Foundation<https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263429150> [2] FLIP-146: Improve new TableSource and TableSink interfaces - Apache Flink - Apache Software Foundation<https://cwiki.apache.org/confluence/display/FLINK/FLIP-146%3A+Improve+new+TableSource+and+TableSink+interfaces> Best, Zhanghao Chen