Re: Parallelism in Production: Best Practices

Jan Brusch Thu, 20 May 2021 01:12:35 -0700


Hi Yaroslav,

here's a fourth option that we usually use: We set the defaultparallelism once when we initially deploy the app (maybe change it a fewtimes in the beginning). From that point on rescale by either resizingthe TaskManager-Nodes or redistributing the parallelism over more / lessTaskManager-Nodes.

For example: We start to run an app with a default parallelism of 64 andwe initially distribute this over 16 TaskManager-Nodes with 4 taskSlotseach. Then we see that we have scaled way too high for the actualworkload. We now have two options: Either reduce the hardware on the 16Nodes (CPU and RAM) or re-scale horizontally by re-distributing theworkload over 8 TaskManager-Nodes with 8 taskSlots each.

Since we leave the parallelism of the Job untouched in each case, we caneasily rescale by re-deploying the whole cluster and let it resume fromthe last checkpoint. A cleaner way would probably be to do thisre-deployment with explicit savepoints.

We are doing this in kubernetes where both scaling options are reallyeasy to carry out. But the same concepts should work on any other setup,too.



Hope that helps

Jan

On 19.05.21 20:00, Yaroslav Tkachenko wrote:

Hi everyone,
I'd love to learn more about how different companies approachspecifying Flink parallelism. I'm specifically interested in real,production workloads.
I can see a few common patterns:
- Rely on default parallelism, scale by changing parallelism for thewhole pipeline. I guess it only works if the pipeline doesn't haveobvious bottlenecks. Also, it looks like the new reactive mode makesspecifying parallelism for an operator obsolete(https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/elastic_scaling/#configuration<https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/elastic_scaling/#configuration>)
- Rely on default parallelism for most of the operators, but overrideit for some. For example, it doesn't make sense for a Kafka source tohave parallelism higher than the number of partitions it consumes.Some custom sinks could choose lower parallelism to avoid overloadingtheir destinations. Some transformation steps could choose higherparallelism to distribute the work better, etc.
- Don't rely on default parallelism and configure parallelismexplicitly for each operator. This requires very good knowledge ofeach operator in the pipeline, but it could lead to very good performance.
Is there a different pattern that I miss? What do you use? Feel freeto share any resources.
If you do specify it explicitly, what do you think about the reactivemode? Will you use it?
Also, how often do you change parallelism? Do you set it once andforget once the pipeline is stable? Do you keep re-evaluating it?
Thanks.


--
neuland  – Büro für Informatik GmbH
Konsul-Smidt-Str. 8g, 28217 Bremen

Telefon (0421) 380107 57
Fax (0421) 380107 99
https://www.neuland-bfi.de

https://twitter.com/neuland
https://facebook.com/neulandbfi
https://xing.com/company/neulandbfi


Geschäftsführer: Thomas Gebauer, Jan Zander
Registergericht: Amtsgericht Bremen, HRB 23395 HB
USt-ID. DE 246585501

Re: Parallelism in Production: Best Practices

Reply via email to