Re: Failures due to inevitable high backpressure

2020-08-27 Thread Arvid Heise
Hi Hubert, The most straight-forward reason for backpressure is under-provisioning of the cluster. An application over time usually needs gradually more resources. If the user base of your company grows, so does the amount of messages (be it click stream, page impressions, or any kind of transacti

Re: Failures due to inevitable high backpressure

2020-08-26 Thread David Anderson
One other thought: some users experiencing this have found it preferable to increase the checkpoint timeout to the point where it is effectively infinite. Checkpoints that can't timeout are likely to eventually complete, which is better than landing in the vicious cycle you described. David On We

Re: Failures due to inevitable high backpressure

2020-08-26 Thread David Anderson
You should begin by trying to identify the cause of the backpressure, because the appropriate fix depends on the details. Possible causes that I have seen include: - the job is inadequately provisioned - blocking i/o is being done in a user function - a huge number of timers are firing simultaneo