Hello Everyone,
Hello Till,
We were facing checkpointing failure issue since version 1.9, currently we
are using  version 1.13.2

We are using filesystem(s3) as statebackend, 10 mins checkpoint timeout,
usually the checkpoint takes 10-30 seconds.
But sometimes I have seen Job failed and restarted due to checkpoint
timeout without huge increasing of incoming data... and also seen the
checkpointing progress of some subtasks get stuck by e.g 7% for 10 mins.
My guess would be somehow the thread for doing checkpointing get blocked...

Any suggestions? idea will be helpful, thanks


Best Regards,

-- 
Xiangyu Su
Java Developer
xian...@smaato.com

Smaato Inc.
San Francisco - New York - Hamburg - Singapore
www.smaato.com

Germany:

Barcastraße 5

22087 Hamburg

Germany
M 0049(176)43330282

The information contained in this communication may be CONFIDENTIAL and is
intended only for the use of the recipient(s) named above. If you are not
the intended recipient, you are hereby notified that any dissemination,
distribution, or copying of this communication, or any of its contents, is
strictly prohibited. If you have received this communication in error,
please notify the sender and delete/destroy the original message and any
copy of it from your computer or paper files.

Reply via email to