Hi Xiangyu,

Can you provide us with more information about your job, which state
backend you are using and how you've configured the checkpointing? Can you
also provide some information about the problematic checkpoints (e.g.
alignment time, async/sync duration) that you find on the checkpoint
details page? If you have access to the logs, then this could also help
better understand what is going on.

In general, such a problem can be caused by backpressure and long alignment
times. Backpressure can come from skewed data or if the user code is
performing very lengthy operations. What you could try is to enable
unaligned checkpoints if the problem is long alignment times caused by
backpressure.

Cheers,
Till

On Thu, Sep 2, 2021 at 11:48 AM Xiangyu Su <xian...@smaato.com> wrote:

> Hello Everyone,
> Hello Till,
> We were facing checkpointing failure issue since version 1.9, currently we
> are using  version 1.13.2
>
> We are using filesystem(s3) as statebackend, 10 mins checkpoint timeout,
> usually the checkpoint takes 10-30 seconds.
> But sometimes I have seen Job failed and restarted due to checkpoint
> timeout without huge increasing of incoming data... and also seen the
> checkpointing progress of some subtasks get stuck by e.g 7% for 10 mins.
> My guess would be somehow the thread for doing checkpointing get blocked...
>
> Any suggestions? idea will be helpful, thanks
>
>
> Best Regards,
>
> --
> Xiangyu Su
> Java Developer
> xian...@smaato.com
>
> Smaato Inc.
> San Francisco - New York - Hamburg - Singapore
> www.smaato.com
>
> Germany:
>
> Barcastraße 5
>
> 22087 Hamburg
>
> Germany
> M 0049(176)43330282
>
> The information contained in this communication may be CONFIDENTIAL and is
> intended only for the use of the recipient(s) named above. If you are not
> the intended recipient, you are hereby notified that any dissemination,
> distribution, or copying of this communication, or any of its contents, is
> strictly prohibited. If you have received this communication in error,
> please notify the sender and delete/destroy the original message and any
> copy of it from your computer or paper files.
>

Reply via email to