Hello, I have a Kafka source (the new one) in Flink 1.15 that's followed by a process function with parallelism=2. Some days, I see long periods of backpressure in the source. During those times, the pool-usage metrics of all tasks stay between 0 and 1%, but the process function appears 100% busy.
To try to avoid backpressure, I increased parallelism to 3. It seems to help, and busy-time decreased to around 80%, but something that caught my attention is that throughput remained unchanged. Concretely, if X is the number of events being written to the Kafka topic every second, each instance of the process function receives roughly X/2 events/s with parallelism=2, and X/3 with parallelism=3. I'm wondering a couple of things. 1. Is it possible that backpressure in this case is essentially a "false positive" because the function is busy 100% of the time even though it's processing enough data? 2. Does Flink expose any way to tune this type of backpressure mechanism? Regards, Alexis.