Parallelism in Production: Best Practices

Yaroslav Tkachenko Wed, 19 May 2021 11:00:55 -0700

Hi everyone,

I'd love to learn more about how different companies approach specifying
Flink parallelism. I'm specifically interested in real, production
workloads.


I can see a few common patterns:

- Rely on default parallelism, scale by changing parallelism for the whole
pipeline. I guess it only works if the pipeline doesn't have obvious
bottlenecks. Also, it looks like the new reactive mode makes specifying
parallelism for an operator obsolete (
https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/elastic_scaling/#configuration
)

- Rely on default parallelism for most of the operators, but override it
for some. For example, it doesn't make sense for a Kafka source to have
parallelism higher than the number of partitions it consumes. Some custom
sinks could choose lower parallelism to avoid overloading their
destinations. Some transformation steps could choose higher parallelism to
distribute the work better, etc.

- Don't rely on default parallelism and configure parallelism
explicitly for each operator. This requires very good knowledge of each
operator in the pipeline, but it could lead to very good performance.

Is there a different pattern that I miss? What do you use? Feel free to
share any resources.

If you do specify it explicitly, what do you think about the reactive mode?
Will you use it?

Also, how often do you change parallelism? Do you set it once and forget
once the pipeline is stable? Do you keep re-evaluating it?

Thanks.

Parallelism in Production: Best Practices

Reply via email to