Re: Clusters restarting without checkpoints from HA metadata

2025-03-05 Thread Max Feng
. Are there any obvious reasons we should not do this? - Max On Sun, Mar 2, 2025 at 5:41 PM Max Feng wrote: > Hi, > > We're running Flink 1.20, native kubernetes application-mode clusters, and > we're running into an issue where clusters are restarting without > checkp

Clusters restarting without checkpoints from HA metadata

2025-03-02 Thread Max Feng
high availability data for job . [4] No checkpoint found during restore. Best, Max Feng

Losing externalized checkpoint reference in certain failure modes

2024-08-19 Thread Max Feng
tempt to resume from previous state, as the checkpoint was no longer referenced. We understand the root cause of the operator error, but we would expect that the externalized checkpoint reference would be retained in this failure mode. Has anyone else run into this issue? Best, Max Feng

Partially created savepoint directories

2024-08-06 Thread Max Feng
Hi, Running Flink 1.15 in application mode, when stop-with-savepointing a job, we encountered a case where a savepoint directory was created, but was empty (we're storing the savepoint on S3, the empty directory marker exists, but neither the _metadata file nor any state files exist). What can