Re: Performance problem with FlinkSQL

Pedro Mázala Thu, 30 Jan 2025 01:02:29 -0800

> The average output rate needed to avoid lag after filtering messages
should be around 60K messages per second. I’ve been testing different
configurations of parallelism, slots and pods (everything runs on
Kubernetes), but I’m far from achieving those numbers.


How are you partitioning your query? Do you see any backpressure happening
on the Flink UI?

> In the latest configuration, I used 20 pods, a parallelism of 120, with 4
slots per taskmanager.

Were all tasks working properly or you had idleness?

> Additionally, setting parallelism to 120 creates hundreds of subtasks for
the smaller topics, which don’t do much but still consume minimal resources
even if idle.

On the table API I'm not sure if you can choose parallelism per "task"
DataStream I'm sure you can do it.



Att,
Pedro Mázala
+31 (06) 3819 3814
Be awesome


On Wed, 29 Jan 2025 at 22:06, Guillermo Ortiz Fernández <
[email protected]> wrote:

> After last checking it uses about 200-400 millicores each pod and 2.2Gb.
>
> El mié, 29 ene 2025 a las 21:41, Guillermo Ortiz Fernández (<
> [email protected]>) escribió:
>
>> I have a job entirely written in Flink SQL. The first part of the program
>> processes 10 input topics and generates one output topic with normalized
>> messages and some filtering applied (really easy, some where by fields and
>> substring). Nine of the topics produce between hundreds and thousands of
>> messages per second, with an average of 4–10 partitions each. The other
>> topic produces 150K messages per second and has 500 partitions. They are
>> unioned to the output topic.
>>
>> The average output rate needed to avoid lag after filtering messages
>> should be around 60K messages per second. I’ve been testing different
>> configurations of parallelism, slots and pods (everything runs on
>> Kubernetes), but I’m far from achieving those numbers.
>>
>> In the latest configuration, I used 20 pods, a parallelism of 120, with 4
>> slots per taskmanager. With this setup, I achieve approximately 20K
>> messages per second, but I’m unable to consume the largest topic at the
>> rate messages are being produced. Additionally, setting parallelism to 120
>> creates hundreds of subtasks for the smaller topics, which don’t do much
>> but still consume minimal resources even if idle.
>>
>> I started trying with parallelism 12 and got about 1000-3000 messages per
>> second.
>>
>> When I check the use of cpu and memory to the pods and don't see any
>> problem and they are far from the limit, each taskmanager has 4gb and
>> 2cpus and they are never close to using the cpu.
>>
>> It’s a requirement to do everything 100% in SQL. How can I improve the
>> throughput rate? Should I be concerned about the hundreds of subtasks
>> created for the smaller topics?
>>
>

Re: Performance problem with FlinkSQL

Reply via email to