Hello Everyone, Hello Till, We were facing checkpointing failure issue since version 1.9, currently we are using version 1.13.2
We are using filesystem(s3) as statebackend, 10 mins checkpoint timeout, usually the checkpoint takes 10-30 seconds. But sometimes I have seen Job failed and restarted due to checkpoint timeout without huge increasing of incoming data... and also seen the checkpointing progress of some subtasks get stuck by e.g 7% for 10 mins. My guess would be somehow the thread for doing checkpointing get blocked... Any suggestions? idea will be helpful, thanks Best Regards, -- Xiangyu Su Java Developer xian...@smaato.com Smaato Inc. San Francisco - New York - Hamburg - Singapore www.smaato.com Germany: Barcastraße 5 22087 Hamburg Germany M 0049(176)43330282 The information contained in this communication may be CONFIDENTIAL and is intended only for the use of the recipient(s) named above. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited. If you have received this communication in error, please notify the sender and delete/destroy the original message and any copy of it from your computer or paper files.