In what cases Flink op is backpressured but the downstream op is not busy or backpressured?

Fanchun Jin Tue, 06 Aug 2024 13:55:18 -0700

Hi,

We are using Flink 1.15.4 streaming running on AWS EKS, and we use default
flow control config (e.g. floating-buffers-per-gate,
memory.buffers-per-channel, max-buffers-per-channel).


The entire pipeline is running on a single task manager (10 core, 24G RAM,
20 total task slots, 8 available task slots). Pod CPU utilization over pod
limit is 20, and Pod memory utilization over pod limit is 60.

A portion of pipeline topology is this <https://i.sstatic.net/cJ3XCGgY.png>,
where OpA and OpB are publishing messages to OpC, and all 3 Ops are with
parallelism 1.

Op A generates 30 records per minutes, per record size is 8 MB. Op B
generates 180K records per minutes, per record size is 1.8KB.

OpC doesn't have external dependencies, and simply forward the messages to
downstream.

The issue is that while both OpC and OpA are not backpressured and not
busy, but OpB is 100% backpressured.

I tried with a large flow control config floating-buffers-per-gate: 200
buffers-per-channel: 100 max-buffers-per-channel: 100, but it didn't help.

I also tried increasing the parallelism of OpC to 8, and it didn't help too.

If I remove OpA input to OpC, OpB is flowing much faster, and OpC becomes
busy with processing messages from OpB.

What can I do to connect back OpA -> OpC, and OpB is less backpressured?

Any help is greatly appreciated!

Thanks,

Brian

In what cases Flink op is backpressured but the downstream op is not busy or backpressured?

Reply via email to