Hey,
I'm writing a Flink app that does some transformation on an event consumed from 
Kafka and then creates time windows keyed by some field, and apply an 
aggregation on all those events.
When I run it with parallelism 1, I get a throughput of around 1.6K events per 
second (so also 1.6K events per slot). With parallelism 5, that goes down to 
1.2K events per slot, and when I increase the parallelism to 10, it drops to 
600 events per slot.
Which means that parallelism 5 and parallelism 10, give me the same total 
throughput (1.2x5 = 600x10).

I noticed that although I have 3 Task Managers, all the all the tasks are run 
on the same machine, causing it's CPU to spike and probably, this is the reason 
that the throughput dramatically decreases. After increasing the parallelism to 
15 and now tasks run on 2/3 machines, the average throughput per slot is still 
around 600.

What could cause this dramatic decrease in performance?

Extra info:

  *   Flink version 1.9.2
  *   Flink High Availability mode
  *   3 task managers, 66 slots total

Execution plan:
[cid:04ba7b84-819d-45b6-98cd-446127a0255b]

Any help would be much appreciated 🙂



Sidney Feiner / Data Platform Developer
M: +972.528197720 / Skype: sidney.feiner.startapp

[emailsignature]

Reply via email to