Gyula Fora created FLINK-26577:
----------------------------------

             Summary: Avoid state loss when switching to last-state upgrade mode
                 Key: FLINK-26577
                 URL: https://issues.apache.org/jira/browse/FLINK-26577
             Project: Flink
          Issue Type: Sub-task
          Components: Kubernetes Operator
            Reporter: Gyula Fora


At the moment there are several corner cases which can lead to accidental state 
loss (or at least weird behaviour) when switching to last-state upgrade mode 
from other modes.

2 cases that immediately come to mind:

savepoint to last-state: 
When the new upgrade mode is last-state, the job deployment will simply be 
deleted. If HA was not enabled previously, the last savepoint might be very far 
back in time. 

stateless to last-state:
If checkpointing and HA is not enabled, the deployment will simply be killed 
like previously and we might start a job from empty state. Maybe taking a 
savepoint would be the right approach in this case and continue from there.

Maybe when switching between modes we should consider the previous mode as well 
as the target mode when deciding the on the suspend strategy. We could also 
simply not allow to switch to last-state if HA is not enabled previously but 
that might be too restrictive.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to