Hi, I want to raise this question again, since I have had this exception on my production job.
The exception is as follows > 2019-11-27 14:47:29 java.lang.RuntimeException: Error while confirming checkpoint > at org.apache.flink.runtime.taskmanager.Task$2.run(Task.java:1205) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java: > 511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1149) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.IllegalStateException: checkpoint completed, but no > transaction pending > at org.apache.flink.util.Preconditions.checkState(Preconditions.java: > 195) > at org.apache.flink.streaming.api.functions.sink. > TwoPhaseCommitSinkFunction.notifyCheckpointComplete( > TwoPhaseCommitSinkFunction.java:267) > at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator > .notifyCheckpointComplete(AbstractUdfStreamOperator.java:130) > at org.apache.flink.streaming.runtime.tasks.StreamTask > .notifyCheckpointComplete(StreamTask.java:822) > at org.apache.flink.runtime.taskmanager.Task$2.run(Task.java:1200) > ... 5 more And these are the checkpoint / savepoint before the job failed. [image: checkoint.png] It seems that checkpoint # 675's notification handled the savepoint # 674's pending transaction holder, but savepoint #674's notification didn't be subsumed or be ignored by JM. Therefore, during the checkpoint #676, some tasks got notification before getting the checkpoint barrier and led to this exception happened, because there was no pending transaction in queue. Does anyone know the details about subsumed notifications mechanism and how checkpoint coordinator handle this situation? Please correct me if I'm wrong. Thanks. Best, Tony Wei Stefan Richter <s.rich...@data-artisans.com> 於 2018年10月8日 週一 下午5:03寫道: > Hi Pedro, > > unfortunately the interesting parts are all removed from the log, we > already know about the exception itself. In particular, what I would like > to see is what checkpoints have been triggered and completed before the > exception happens. > > Best, > Stefan > > > Am 08.10.2018 um 10:23 schrieb PedroMrChaves <pedro.mr.cha...@gmail.com > >: > > > > Hello, > > > > Find attached the jobmanager.log. I've omitted the log lines from other > > runs, only left the job manager info and the run with the error. > > > > jobmanager.log > > < > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t612/jobmanager.log> > > > > > > > > > Thanks again for your help. > > > > Regards, > > Pedro. > > > > > > > > ----- > > Best Regards, > > Pedro Chaves > > -- > > Sent from: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ > >