Hey, I'm writing a Flink app that does some transformation on an event consumed from Kafka and then creates time windows keyed by some field, and apply an aggregation on all those events. When I run it with parallelism 1, I get a throughput of around 1.6K events per second (so also 1.6K events per slot). With parallelism 5, that goes down to 1.2K events per slot, and when I increase the parallelism to 10, it drops to 600 events per slot. Which means that parallelism 5 and parallelism 10, give me the same total throughput (1.2x5 = 600x10).
I noticed that although I have 3 Task Managers, all the all the tasks are run on the same machine, causing it's CPU to spike and probably, this is the reason that the throughput dramatically decreases. After increasing the parallelism to 15 and now tasks run on 2/3 machines, the average throughput per slot is still around 600. What could cause this dramatic decrease in performance? Extra info: * Flink version 1.9.2 * Flink High Availability mode * 3 task managers, 66 slots total Execution plan: [cid:04ba7b84-819d-45b6-98cd-446127a0255b] Any help would be much appreciated 🙂 Sidney Feiner / Data Platform Developer M: +972.528197720 / Skype: sidney.feiner.startapp [emailsignature]