Hi Jason,
If you see a back pressure warning for a task, this means it is producing
data faster than the downstream operators can consume.
We should avoid high back pressure in online jobs because it may lead to
the following problems:
1. there are potential performance bottlenecks and may cause high latency
2. for aligned checkpoint, may cause checkpoint problems (e.g long end to
end duration) because it took more time to do barrier alignment in back
pressure status.
Please note, the community introduced unaligned checkpoint [1], which could
solve high checkpoint duration due to back pressure.

All In all, it's better to avoid back pressure, please check document [2]
to see what to do with back pressure. If you could tolerate the back
pressure, please use unaligned checkpoint instead of aligned checkpoint to
avoid high checkpoint duration due to back pressure.

[1]
https://ci.apache.org/projects/flink/flink-docs-master/docs/ops/state/checkpoints/#unaligned-checkpoints
[2]
https://flink.apache.org/2019/07/23/flink-network-stack-2.html#backpressure

Best,
JING ZHANG

Jason Liu <jasonli...@ucla.edu> 于2021年6月17日周四 上午8:40写道:

> Hi all,
>
>     We are running Flink on AWS Kinesis Data Analytics and lately. After
> the Flink 1.11 upgrades, we have noticed some of our apps have continuous
> backpressure since the Flink job starts. However, we have been running
> these apps for a while now and if we decrease the source parallelism to try
> to reduce the backpressure, we see the app overall throughput drops
> slightly comparing to when the source parallelism was still high. Just
> wondering, if it's okay we keep the app configuration as it is (tolerating
> the backpressure), since it's pretty stable and have good performance.
>
> Thanks,
> Jason
>

Reply via email to