Thanks for sharing this work Gyula! That's great to see the FLIP covers
some of the limitations already. I will follow the FLIP and associated JIRA
ticket.
Hi Matthias Pohl. I'd be interested to learn if there has been any progress
on the FLIP-360 or associated JIRA issue FLINK-31709.
On Fri, Mar
I agree, we would need some FLIPs to cover this. Actually there is already
some work on this topic initiated by Matthias Pohl (ccd).
Please see this:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-360%3A+Merging+the+ExecutionGraphInfoStore+and+the+JobResultStore+into+a+single+component+Comp
No worries, thanks for the reply Gyula.
Ah yes, I see how those points you raised make the feature tricky to
implement.
Could this be considered for a FLIP (or two) in the future?
On Wed, Mar 20, 2024 at 2:21 PM Gyula Fóra wrote:
> Sorry for the late reply Kevin.
>
> I think what you are sugges
Sorry for the late reply Kevin.
I think what you are suggesting makes sense, it would be basically a
`last-state` startup mode. This would also help in cases where the current
last-state mechanism fails to locate HA metadata (and the state).
This is somewhat of a tricky feature to implement:
1.
Thanks for your response Gyula. Yes I understand, it doesn't really fit
nicely into the Kubernetes Operator pattern.
I do still wonder about the idea of supporting a feature where upon first
deploy, Flink Operator optionally (flag/param enabled) finds the most
recent snapshot (in a specified objec
Hey Kevin!
The general mismatch I see here is that operators and resources are pretty
cluster dependent. The operator itself is running in the same cluster so it
feels out of scope to submit resources to different clusters, this doesn't
really sound like what any Kubernetes Operator should do in g
Hi Max,
It feels a bit hacky to need to back-up the resources directly from the
cluster, as opposed to being able to redeploy our checked-in k8s manifests
such that they failover correctly, but that makes sense to me and we can
look into this approach. Thanks for the suggestion!
I'd still be inte
Hi Kevin,
Theoretically, as long as you move over all k8s resources, failover
should work fine on the Flink and Flink Operator side. The tricky part
is the handover. You will need to backup all resources from the old
cluster, shutdown the old cluster, then re-create them on the new
cluster. The op
Another thought could be modifying the operator to have a behaviour where
upon first deploy, it optionally (flag/param enabled) finds the most recent
snapshot and uses that as the initialSavepointPath to restore and run the
Flink job.
On Wed, Mar 6, 2024 at 2:07 PM Kevin Lam wrote:
> Hi there,
>
Hi there,
We use the Flink Kubernetes Operator, and I am investigating how we can
easily support failing over a FlinkDeployment from one Kubernetes Cluster
to another in the case of an outage that requires us to migrate a large
number of FlinkDeployments from one K8s cluster to another.
I underst
10 matches
Mail list logo