Hi Jan, thanks for sharing this! Just wanted to confirm: this approach works because of the task slot sharing feature in Flink, doesn't it?
On Thu, May 20, 2021 at 1:12 AM Jan Brusch <jan.bru...@neuland-bfi.de> wrote: > > Hi Yaroslav, > > here's a fourth option that we usually use: We set the default > parallelism once when we initially deploy the app (maybe change it a few > times in the beginning). From that point on rescale by either resizing > the TaskManager-Nodes or redistributing the parallelism over more / less > TaskManager-Nodes. > > For example: We start to run an app with a default parallelism of 64 and > we initially distribute this over 16 TaskManager-Nodes with 4 taskSlots > each. Then we see that we have scaled way too high for the actual > workload. We now have two options: Either reduce the hardware on the 16 > Nodes (CPU and RAM) or re-scale horizontally by re-distributing the > workload over 8 TaskManager-Nodes with 8 taskSlots each. > > Since we leave the parallelism of the Job untouched in each case, we can > easily rescale by re-deploying the whole cluster and let it resume from > the last checkpoint. A cleaner way would probably be to do this > re-deployment with explicit savepoints. > > We are doing this in kubernetes where both scaling options are really > easy to carry out. But the same concepts should work on any other setup, > too. > > > Hope that helps > > Jan > > On 19.05.21 20:00, Yaroslav Tkachenko wrote: > > Hi everyone, > > > > I'd love to learn more about how different companies approach > > specifying Flink parallelism. I'm specifically interested in real, > > production workloads. > > > > I can see a few common patterns: > > > > - Rely on default parallelism, scale by changing parallelism for the > > whole pipeline. I guess it only works if the pipeline doesn't have > > obvious bottlenecks. Also, it looks like the new reactive mode makes > > specifying parallelism for an operator obsolete > > ( > https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/elastic_scaling/#configuration > > < > https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/elastic_scaling/#configuration > >) > > > > - Rely on default parallelism for most of the operators, but override > > it for some. For example, it doesn't make sense for a Kafka source to > > have parallelism higher than the number of partitions it consumes. > > Some custom sinks could choose lower parallelism to avoid overloading > > their destinations. Some transformation steps could choose higher > > parallelism to distribute the work better, etc. > > > > - Don't rely on default parallelism and configure parallelism > > explicitly for each operator. This requires very good knowledge of > > each operator in the pipeline, but it could lead to very good > performance. > > > > Is there a different pattern that I miss? What do you use? Feel free > > to share any resources. > > > > If you do specify it explicitly, what do you think about the reactive > > mode? Will you use it? > > > > Also, how often do you change parallelism? Do you set it once and > > forget once the pipeline is stable? Do you keep re-evaluating it? > > > > Thanks. > > -- > neuland – Büro für Informatik GmbH > Konsul-Smidt-Str. 8g, 28217 Bremen > > Telefon (0421) 380107 57 > Fax (0421) 380107 99 > https://www.neuland-bfi.de > > https://twitter.com/neuland > https://facebook.com/neulandbfi > https://xing.com/company/neulandbfi > > > Geschäftsführer: Thomas Gebauer, Jan Zander > Registergericht: Amtsgericht Bremen, HRB 23395 HB > USt-ID. DE 246585501 > >