Re: Parallelism in Production: Best Practices

Yaroslav Tkachenko Thu, 20 May 2021 09:40:46 -0700

Hi Jan, thanks for sharing this!

Just wanted to confirm: this approach works because of the task slot
sharing feature in Flink, doesn't it?


On Thu, May 20, 2021 at 1:12 AM Jan Brusch <jan.bru...@neuland-bfi.de>
wrote:

>
> Hi Yaroslav,
>
> here's a fourth option that we usually use: We set the default
> parallelism once when we initially deploy the app (maybe change it a few
> times in the beginning). From that point on rescale by either resizing
> the TaskManager-Nodes or redistributing the parallelism over more / less
> TaskManager-Nodes.
>
> For example: We start to run an app with a default parallelism of 64 and
> we initially distribute this over 16 TaskManager-Nodes with 4 taskSlots
> each. Then we see that we have scaled way too high for the actual
> workload. We now have two options: Either reduce the hardware on the 16
> Nodes (CPU and RAM) or re-scale horizontally by re-distributing the
> workload over 8 TaskManager-Nodes with 8 taskSlots each.
>
> Since we leave the parallelism of the Job untouched in each case, we can
> easily rescale by re-deploying the whole cluster and let it resume from
> the last checkpoint. A cleaner way would probably be to do this
> re-deployment with explicit savepoints.
>
> We are doing this in kubernetes where both scaling options are really
> easy to carry out. But the same concepts should work on any other setup,
> too.
>
>
> Hope that helps
>
> Jan
>
> On 19.05.21 20:00, Yaroslav Tkachenko wrote:
> > Hi everyone,
> >
> > I'd love to learn more about how different companies approach
> > specifying Flink parallelism. I'm specifically interested in real,
> > production workloads.
> >
> > I can see a few common patterns:
> >
> > - Rely on default parallelism, scale by changing parallelism for the
> > whole pipeline. I guess it only works if the pipeline doesn't have
> > obvious bottlenecks. Also, it looks like the new reactive mode makes
> > specifying parallelism for an operator obsolete
> > (
> https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/elastic_scaling/#configuration
> > <
> https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/elastic_scaling/#configuration
> >)
> >
> > - Rely on default parallelism for most of the operators, but override
> > it for some. For example, it doesn't make sense for a Kafka source to
> > have parallelism higher than the number of partitions it consumes.
> > Some custom sinks could choose lower parallelism to avoid overloading
> > their destinations. Some transformation steps could choose higher
> > parallelism to distribute the work better, etc.
> >
> > - Don't rely on default parallelism and configure parallelism
> > explicitly for each operator. This requires very good knowledge of
> > each operator in the pipeline, but it could lead to very good
> performance.
> >
> > Is there a different pattern that I miss? What do you use? Feel free
> > to share any resources.
> >
> > If you do specify it explicitly, what do you think about the reactive
> > mode? Will you use it?
> >
> > Also, how often do you change parallelism? Do you set it once and
> > forget once the pipeline is stable? Do you keep re-evaluating it?
> >
> > Thanks.
>
> --
> neuland  – Büro für Informatik GmbH
> Konsul-Smidt-Str. 8g, 28217 Bremen
>
> Telefon (0421) 380107 57
> Fax (0421) 380107 99
> https://www.neuland-bfi.de
>
> https://twitter.com/neuland
> https://facebook.com/neulandbfi
> https://xing.com/company/neulandbfi
>
>
> Geschäftsführer: Thomas Gebauer, Jan Zander
> Registergericht: Amtsgericht Bremen, HRB 23395 HB
> USt-ID. DE 246585501
>
>

Re: Parallelism in Production: Best Practices

Reply via email to