Hi Devs,

Dewei (cced) and I would like to start a discussion on FLIP-367: Support 
Setting Parallelism for Table/SQL Sources [1].

Currently, Flink Table/SQL jobs do not expose fine-grained control of operator 
parallelism to users. FLIP-146 [2] brings us support for setting parallelism 
for sinks, but except for that, one can only set a default global parallelism 
and all other operators share the same parallelism. However, in many cases, 
setting parallelism for sources individually is preferable:

- ​Many connectors have an upper bound parallelism to efficiently ingest data. 
For example, the parallelism of a Kafka source is bound by the number of 
partitions, any extra tasks would be idle.
- ​Other operators may involve intensive computation and need a larger 
parallelism.

We propose to improve the current situation by extending the current table 
source API to support setting parallelism for Table/SQL sources via connector 
options.

Looking forward to your feedback.

[1] FLIP-367: Support Setting Parallelism for Table/SQL Sources - Apache Flink 
- Apache Software 
Foundation<https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263429150>
[2] FLIP-146: Improve new TableSource and TableSink interfaces - Apache Flink - 
Apache Software 
Foundation<https://cwiki.apache.org/confluence/display/FLINK/FLIP-146%3A+Improve+new+TableSource+and+TableSink+interfaces>

Best,
Zhanghao Chen

Reply via email to