After last checking it uses about 200-400 millicores each pod and 2.2Gb.

El mié, 29 ene 2025 a las 21:41, Guillermo Ortiz Fernández (<
guillermo.ortiz.f...@gmail.com>) escribió:

> I have a job entirely written in Flink SQL. The first part of the program
> processes 10 input topics and generates one output topic with normalized
> messages and some filtering applied (really easy, some where by fields and
> substring). Nine of the topics produce between hundreds and thousands of
> messages per second, with an average of 4–10 partitions each. The other
> topic produces 150K messages per second and has 500 partitions. They are
> unioned to the output topic.
>
> The average output rate needed to avoid lag after filtering messages
> should be around 60K messages per second. I’ve been testing different
> configurations of parallelism, slots and pods (everything runs on
> Kubernetes), but I’m far from achieving those numbers.
>
> In the latest configuration, I used 20 pods, a parallelism of 120, with 4
> slots per taskmanager. With this setup, I achieve approximately 20K
> messages per second, but I’m unable to consume the largest topic at the
> rate messages are being produced. Additionally, setting parallelism to 120
> creates hundreds of subtasks for the smaller topics, which don’t do much
> but still consume minimal resources even if idle.
>
> I started trying with parallelism 12 and got about 1000-3000 messages per
> second.
>
> When I check the use of cpu and memory to the pods and don't see any
> problem and they are far from the limit, each taskmanager has 4gb and
> 2cpus and they are never close to using the cpu.
>
> It’s a requirement to do everything 100% in SQL. How can I improve the
> throughput rate? Should I be concerned about the hundreds of subtasks
> created for the smaller topics?
>

Reply via email to