Hi,

I want to raise this question again, since I have had this exception on my
production job.

The exception is as follows


> 2019-11-27 14:47:29



java.lang.RuntimeException: Error while confirming checkpoint
>     at org.apache.flink.runtime.taskmanager.Task$2.run(Task.java:1205)
>     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:
> 511)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>     at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1149)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.IllegalStateException: checkpoint completed, but no
> transaction pending
>     at org.apache.flink.util.Preconditions.checkState(Preconditions.java:
> 195)
>     at org.apache.flink.streaming.api.functions.sink.
> TwoPhaseCommitSinkFunction.notifyCheckpointComplete(
> TwoPhaseCommitSinkFunction.java:267)
>     at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator
> .notifyCheckpointComplete(AbstractUdfStreamOperator.java:130)
>     at org.apache.flink.streaming.runtime.tasks.StreamTask
> .notifyCheckpointComplete(StreamTask.java:822)
>     at org.apache.flink.runtime.taskmanager.Task$2.run(Task.java:1200)
>     ... 5 more


And these are the checkpoint / savepoint before the job failed.
[image: checkoint.png]

It seems that checkpoint # 675's notification handled the savepoint # 674's
pending transaction holder, but savepoint #674's notification didn't be
subsumed or be ignored by JM.
Therefore, during the checkpoint #676, some tasks got notification before
getting the checkpoint barrier and led to this exception happened, because
there was no pending transaction in queue.

Does anyone know the details about subsumed notifications mechanism and how
checkpoint coordinator handle this situation? Please correct me if I'm
wrong. Thanks.

Best,
Tony Wei

Stefan Richter <s.rich...@data-artisans.com> 於 2018年10月8日 週一 下午5:03寫道:

> Hi Pedro,
>
> unfortunately the interesting parts are all removed from the log, we
> already know about the exception itself. In particular, what I would like
> to see is what checkpoints have been triggered and completed before the
> exception happens.
>
> Best,
> Stefan
>
> > Am 08.10.2018 um 10:23 schrieb PedroMrChaves <pedro.mr.cha...@gmail.com
> >:
> >
> > Hello,
> >
> > Find attached the jobmanager.log. I've omitted the log lines from other
> > runs, only left the job manager info and the run with the error.
> >
> > jobmanager.log
> > <
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t612/jobmanager.log>
>
> >
> >
> >
> > Thanks again for your help.
> >
> > Regards,
> > Pedro.
> >
> >
> >
> > -----
> > Best Regards,
> > Pedro Chaves
> > --
> > Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>
>

Reply via email to