[ https://issues.apache.org/jira/browse/FLINK-18641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166057#comment-17166057 ]
Biao Liu commented on FLINK-18641: ---------------------------------- [~becket_qin], [~pnowojski], the ticket has not been assigned yet. Is there anyone working on this? > "Failure to finalize checkpoint" error in MasterTriggerRestoreHook > ------------------------------------------------------------------ > > Key: FLINK-18641 > URL: https://issues.apache.org/jira/browse/FLINK-18641 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing > Affects Versions: 1.11.0 > Reporter: Brian Zhou > Priority: Major > > https://github.com/pravega/flink-connectors is a Pravega connector for Flink. > The ReaderCheckpointHook[1] class uses the Flink `MasterTriggerRestoreHook` > interface to trigger the Pravega checkpoint during Flink checkpoints to make > sure the data recovery. The checkpoint recovery tests are running fine in > Flink 1.10, but it has below issues in Flink 1.11 causing the tests time out. > Suspect it is related to the checkpoint coordinator thread model changes in > Flink 1.11 > Error stacktrace: > {code} > 2020-07-09 15:39:39,999 30945 [jobmanager-future-thread-5] WARN > o.a.f.runtime.jobmaster.JobMaster - Error while processing checkpoint > acknowledgement message > org.apache.flink.runtime.checkpoint.CheckpointException: Could not finalize > the pending checkpoint 3. Failure reason: Failure to finalize checkpoint. > at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.completePendingCheckpoint(CheckpointCoordinator.java:1033) > at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.receiveAcknowledgeMessage(CheckpointCoordinator.java:948) > at > org.apache.flink.runtime.scheduler.SchedulerBase.lambda$acknowledgeCheckpoint$4(SchedulerBase.java:802) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.flink.util.SerializedThrowable: Pending checkpoint has > not been fully acknowledged yet > at > org.apache.flink.util.Preconditions.checkState(Preconditions.java:195) > at > org.apache.flink.runtime.checkpoint.PendingCheckpoint.finalizeCheckpoint(PendingCheckpoint.java:298) > at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.completePendingCheckpoint(CheckpointCoordinator.java:1021) > ... 9 common frames omitted > {code} > More detail in this mailing thread: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Pravega-connector-cannot-recover-from-the-checkpoint-due-to-quot-Failure-to-finalize-checkpoint-quot-td36652.html > Also in https://github.com/pravega/flink-connectors/issues/387 -- This message was sent by Atlassian Jira (v8.3.4#803005)