Hi, jonas If you restart flink cluster by delete/create deployment directly, it will be automatically restored from the latest checkpoint[1], so maybe just enabling the checkpoint is enough. But if you want to use savepoint, you need to check whether the latest savepoint is successful (check whether have _metadata in savepoint dir is useful in most scenarios, but in some cases the _metadata may not be completed).
[1] https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/ha/kubernetes_ha/ Best, Weihua On Tue, Jul 5, 2022 at 10:54 PM jonas eyob <[email protected]> wrote: > Hi! > > We are running a Standalone job on Kubernetes using application deployment > mode, with HA enabled. > > We have attempted to automate how we create and restore savepoints by > running a script for generating a savepoint (using k8 preStop hook) and > another one for restoring from a savepoint (located in a S3 bucket). > > Restoring from a savepoint is typically not a problem once we have a > savepoint generated and accessible in our s3 bucket. The problem is > generating the savepoint which hasn't been very reliable thus far. Logs are > not particularly helpful either so we wanted to rethink how we go about > taking savepoints. > > Are there any best practices for doing this in a CI/CD manner given our > setup? > > -- > >
