Hi everyone, I'd love to learn more about how different companies approach specifying Flink parallelism. I'm specifically interested in real, production workloads.
I can see a few common patterns: - Rely on default parallelism, scale by changing parallelism for the whole pipeline. I guess it only works if the pipeline doesn't have obvious bottlenecks. Also, it looks like the new reactive mode makes specifying parallelism for an operator obsolete ( https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/elastic_scaling/#configuration ) - Rely on default parallelism for most of the operators, but override it for some. For example, it doesn't make sense for a Kafka source to have parallelism higher than the number of partitions it consumes. Some custom sinks could choose lower parallelism to avoid overloading their destinations. Some transformation steps could choose higher parallelism to distribute the work better, etc. - Don't rely on default parallelism and configure parallelism explicitly for each operator. This requires very good knowledge of each operator in the pipeline, but it could lead to very good performance. Is there a different pattern that I miss? What do you use? Feel free to share any resources. If you do specify it explicitly, what do you think about the reactive mode? Will you use it? Also, how often do you change parallelism? Do you set it once and forget once the pipeline is stable? Do you keep re-evaluating it? Thanks.