Hi Michal,

I am happy that you have found this new feature interesting, I hope you
will find it useful if you plan to use it.

1. I am not sure when or how the deprecated fields will be removed, it
should happen when the community is satisfied with the
new FlinkStateSnapshots CRDs. For checkpoints, the path is not part of the
response when we retrieve the checkpoint status [1]. So the path is
retrieved after the checkpoint is marked as completed with a different
request [2], but the checkpoint history is limited
(web.checkpoints.history), so it's possible that by the time the operator
tries to download the path of the checkpoint, the cache has already dropped
that checkpoint ID. In these cases, the operator will still mark the
checkpoint as COMPLETED, but the path will be left blank. [3]

2. One of the main benefits is that this approach allows users to create
and manage their snapshots in the Kubernetes workflow that they are already
familiar with. While having these in the FlinkDeployment/FlinkSessionJob CR
is handy for taking a quick snapshot, others would like to have a more
sophisticated way to manage them. This new approach makes it easier to
manage and list snapshots of any or all of their deployments.

3. If you have periodic checkpointing enabled, snapshot.resource.enabled is
set to true, and the FlinkStateSnapshot CRD is installed on your Kubernetes
cluster, the FlinkDeployment/FlinkSessionJob reconciler should create new
FlinkStateSnapshot resources based on that interval with empty status.
Then StateSnapshotReconciler will be notified of this new resource, and
will trigger the checkpoint and update the status fields when it's finished.

If you have any other questions, please let me know. I hope that you will
be able to find this feature useful.

[1]
https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/ops/rest_api/#jobs-jobid-checkpoints-triggerid
[2]
https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/ops/rest_api/#jobs-jobid-checkpoints-details-checkpointid
[3]
https://github.com/apache/flink-kubernetes-operator/blob/091e803a6ae713ebe839742694ab6ca53249c4dd/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/observer/snapshot/StateSnapshotObserver.java#L151

Michas Szacillo (BLOOMBERG/ 919 3RD A) <mszaci...@bloomberg.net> ezt írta
(időpont: 2024. dec. 18., Sze, 0:11):

> Hi all! I recently came across the FlinkStateSnapshot feature which I
> found quite interesting, but I had a couple questions on its use.
>
> 1. In favor of FlinkStateSnapshots, I see that both the checkpointInfo and
> savepointInfo have been deprecated as part of the JobStatus of
> FlinkDeployment. Does this mean these fields will eventually be completely
> removed? Additionally, would the community consider adding a path field to
> the existing checkpointInfo? Currently it only shows the triggerId, which
> isn't as helpful when trying to find the actual checkpoint path.
>
> 2. For FlinkStateSnapshots, was is the major benefit of separating out the
> checkpoint and savepoint info outside of the FlinkDeployment status? I can
> see the benefit of having some separation, but I feel like I may be missing
> additional context.
>
> 3. Do FlinkStateSnapshots trigger checkpoints by default if periodic
> checkpointing is enabled? I was using Flink v1_17 and recently tried
> setting the kubernetes.operator.periodic.checkpoint.interval and although
> Flink Operator does trigger periodic checkpoints, I did not see the
> FlinkStateSnapshot status reconcile. I had assumed this status would be
> populated automatically if the snapshot was configured like in the
> documentation:
> https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/custom-resource/snapshots/#checkpoint.
>
>
> Appreciate the help!
>
> - Michal

Reply via email to