Hi,
I guess you used a fixed JOB_ID, and configured the same checkpoint dir as
before ?
And you may also start the job without before state ?
The new job cannot know anything about before checkpoints, that's why the
new job will fail when it tries to generate a new checkpoint.
I'd like to suggest you to use different JOB_ID for different jobs, or set
a different checkpoint dir for a new job.

On Tue, May 9, 2023 at 9:38 PM amenreet sodhi <amenso...@gmail.com> wrote:

> Hi all,
>
> Is there any way to prevent restart of flink job, or override the
> checkpoint metadata, if for some reason there exists a checkpoint by same
> name. I get the following exception and my job restarts, have been trying
> to find solution for a very long time but havent found anything useful yet,
> other than manually cleaning.
>
> 2023-02-27 10:00:50,360 WARN  
> org.apache.flink.runtime.checkpoint.CheckpointFailureManager
> [] - Failed to trigger or complete checkpoint 1 for job
> 000000006e6b13320000000000000000. (0 consecutive failed attempts so far)
>
> org.apache.flink.runtime.checkpoint.CheckpointException: Failure to
> finalize checkpoint.
>
> at
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.finalizeCheckpoint(CheckpointCoordinator.java:1375)
> ~[event_executor-1.0-SNAPSHOT.jar:?]
>
> at
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.completePendingCheckpoint(CheckpointCoordinator.java:1265)
> ~[event_executor-1.0-SNAPSHOT.jar:?]
>
> at
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.receiveAcknowledgeMessage(CheckpointCoordinator.java:1157)
> ~[event_executor-1.0-SNAPSHOT.jar:?]
>
> at
> org.apache.flink.runtime.scheduler.ExecutionGraphHandler.lambda$acknowledgeCheckpoint$1(ExecutionGraphHandler.java:89)
> ~[event_executor-1.0-SNAPSHOT.jar:?]
>
> at
> org.apache.flink.runtime.scheduler.ExecutionGraphHandler.lambda$processCheckpointCoordinatorMessage$3(ExecutionGraphHandler.java:119)
> ~[event_executor-1.0-SNAPSHOT.jar:?]
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> [?:?]
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> [?:?]
>
> at java.lang.Thread.run(Thread.java:834) [?:?]
>
> Caused by: java.io.IOException: Target file
> file:/opt/flink/pm/checkpoint/000000006e6b13320000000000000000/chk-1/_metadata
> already exists.
>
> at
> org.apache.flink.runtime.state.filesystem.FsCheckpointMetadataOutputStream.getOutputStreamWrapper(FsCheckpointMetadataOutputStream.java:168)
> ~[event_executor-1.0-SNAPSHOT.jar:?]
>
> at
> org.apache.flink.runtime.state.filesystem.FsCheckpointMetadataOutputStream.<init>(FsCheckpointMetadataOutputStream.java:64)
> ~[event_executor-1.0-SNAPSHOT.jar:?]
>
> at
> org.apache.flink.runtime.state.filesystem.FsCheckpointStorageLocation.createMetadataOutputStream(FsCheckpointStorageLocation.java:109)
> ~[event_executor-1.0-SNAPSHOT.jar:?]
>
> at
> org.apache.flink.runtime.checkpoint.PendingCheckpoint.finalizeCheckpoint(PendingCheckpoint.java:332)
> ~[event_executor-1.0-SNAPSHOT.jar:?]
>
> at
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.finalizeCheckpoint(CheckpointCoordinator.java:1361)
> ~[event_executor-1.0-SNAPSHOT.jar:?]
>
> ... 7 more
>
> 2023-02-27 10:00:50,374 WARN  org.apache.flink.runtime.jobmaster.JobMaster
>                 [] - Error while processing AcknowledgeCheckpoint message
>
> org.apache.flink.runtime.checkpoint.CheckpointException: Could not
> finalize the pending checkpoint 1. Failure reason: Failure to finalize
> checkpoint.
>
> at
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.finalizeCheckpoint(CheckpointCoordinator.java:1381)
> ~[event_executor-1.0-SNAPSHOT.jar:?]
>
> at
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.completePendingCheckpoint(CheckpointCoordinator.java:1265)
> ~[event_executor-1.0-SNAPSHOT.jar:?]
>
> at
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.receiveAcknowledgeMessage(CheckpointCoordinator.java:1157)
> ~[event_executor-1.0-SNAPSHOT.jar:?]
>
> at
> org.apache.flink.runtime.scheduler.ExecutionGraphHandler.lambda$acknowledgeCheckpoint$1(ExecutionGraphHandler.java:89)
> ~[event_executor-1.0-SNAPSHOT.jar:?]
>
> at
> org.apache.flink.runtime.scheduler.ExecutionGraphHandler.lambda$processCheckpointCoordinatorMessage$3(ExecutionGraphHandler.java:119)
> ~[event_executor-1.0-SNAPSHOT.jar:?]
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> [?:?]
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> [?:?]
>
> at java.lang.Thread.run(Thread.java:834) [?:?]
>
> Caused by: java.io.IOException: Target file
> file:/opt/flink/pm/checkpoint/000000006e6b13320000000000000000/chk-1/_metadata
> already exists.
>
> at
> org.apache.flink.runtime.state.filesystem.FsCheckpointMetadataOutputStream.getOutputStreamWrapper(FsCheckpointMetadataOutputStream.java:168)
> ~[event_executor-1.0-SNAPSHOT.jar:?]
>
>
> Please let me know if anyone knows how to resolve this issue.
>
> Thanks and Regards
>
> Amenreet Singh Sodhi
>
>
>

-- 
Best,
Hangxiang.

Reply via email to