I have a job entirely written in Flink SQL. The first part of the program processes 10 input topics and generates one output topic with normalized messages and some filtering applied (really easy, some where by fields and substring). Nine of the topics produce between hundreds and thousands of messages per second, with an average of 4–10 partitions each. The other topic produces 150K messages per second and has 500 partitions. They are unioned to the output topic.
The average output rate needed to avoid lag after filtering messages should be around 60K messages per second. I’ve been testing different configurations of parallelism, slots and pods (everything runs on Kubernetes), but I’m far from achieving those numbers. In the latest configuration, I used 20 pods, a parallelism of 120, with 4 slots per taskmanager. With this setup, I achieve approximately 20K messages per second, but I’m unable to consume the largest topic at the rate messages are being produced. Additionally, setting parallelism to 120 creates hundreds of subtasks for the smaller topics, which don’t do much but still consume minimal resources even if idle. I started trying with parallelism 12 and got about 1000-3000 messages per second. When I check the use of cpu and memory to the pods and don't see any problem and they are far from the limit, each taskmanager has 4gb and 2cpus and they are never close to using the cpu. It’s a requirement to do everything 100% in SQL. How can I improve the throughput rate? Should I be concerned about the hundreds of subtasks created for the smaller topics?