Hi!

Did you check the https://github.com/apache/flink-kubernetes-operator
<https://github.com/apache/flink-kubernetes-operator> by any chance?

It provides many of the application lifecycle features that you are
probably after straight out-of-the-box. It has both manual and periodic
savepoint triggering also included in the latest upcoming version :)

Cheers,
Gyula

On Tue, Jul 5, 2022 at 5:34 PM Weihua Hu <huweihua....@gmail.com> wrote:

> Hi, jonas
>
> If you restart flink cluster by delete/create deployment directly, it will
> be automatically restored from the latest checkpoint[1], so maybe just
> enabling the checkpoint is enough.
> But if you want to use savepoint, you need to check whether the latest
> savepoint is successful (check whether have _metadata in savepoint dir is
> useful in most scenarios, but in some cases the _metadata may not be
> completed).
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/ha/kubernetes_ha/
>
> Best,
> Weihua
>
>
> On Tue, Jul 5, 2022 at 10:54 PM jonas eyob <jonas.e...@gmail.com> wrote:
>
>> Hi!
>>
>> We are running a Standalone job on Kubernetes using application
>> deployment mode, with HA enabled.
>>
>> We have attempted to automate how we create and restore savepoints by
>> running a script for generating a savepoint (using k8 preStop hook) and
>> another one for restoring from a savepoint (located in a S3 bucket).
>>
>> Restoring from a savepoint is typically not a problem once we have a
>> savepoint generated and accessible in our s3 bucket. The problem is
>> generating the savepoint which hasn't been very reliable thus far. Logs are
>> not particularly helpful either so we wanted to rethink how we go about
>> taking savepoints.
>>
>> Are there any best practices for doing this in a CI/CD manner given our
>> setup?
>>
>> --
>>
>>

Reply via email to