Hello,

I have a Kafka source (the new one) in Flink 1.15 that's followed by a
process function with parallelism=2. Some days, I see long periods of
backpressure in the source. During those times, the pool-usage metrics of
all tasks stay between 0 and 1%, but the process function appears 100% busy.

To try to avoid backpressure, I increased parallelism to 3. It seems to
help, and busy-time decreased to around 80%, but something that caught my
attention is that throughput remained unchanged. Concretely, if X is the
number of events being written to the Kafka topic every second, each
instance of the process function receives roughly X/2 events/s with
parallelism=2, and X/3 with parallelism=3.

I'm wondering a couple of things.

1. Is it possible that backpressure in this case is essentially a "false
positive" because the function is busy 100% of the time even though it's
processing enough data?
2. Does Flink expose any way to tune this type of backpressure mechanism?

Regards,
Alexis.

Reply via email to