[Flink K8s operator] HA metadata not available to restore from last state

Dongwon Kim Tue, 22 Nov 2022 00:43:10 -0800

Hi,

While using a last-state upgrade mode on flink-k8s-operator-1.2.0 and
flink-1.14.3, we're occasionally facing the following error:


Status:
>   Cluster Info:
>     Flink - Revision:             98997ea @ 2022-01-08T23:23:54+01:00
>     Flink - Version:              1.14.3
>   Error:                          HA metadata not available to restore
> from last state. It is possible that the job has finished or terminally
> failed, or the configmaps have been deleted. Manual restore required.
>   Job Manager Deployment Status:  ERROR
>   Job Status:
>     Job Id:    e8dd04ea4b03f1817a4a4b9e5282f433
>     Job Name:  flinktest
>     Savepoint Info:
>       Last Periodic Savepoint Timestamp:  0
>       Savepoint History:
>       Trigger Id:
>       Trigger Timestamp:  0
>       Trigger Type:       UNKNOWN
>     Start Time:           1668660381400
>     State:                RECONCILING
>     Update Time:          1668994910151
>   Reconciliation Status:
>     Last Reconciled Spec:  ...
>     Reconciliation Timestamp:  1668660371853
>     State:                     DEPLOYED
>   Task Manager:
>     Label Selector:  component=taskmanager,app=flinktest
>     Replicas:        1
> Events:
>   Type     Reason            Age                 From
> Message
>   ----     ------            ----                ----
> -------
>   Normal   JobStatusChanged  30m                 Job
> Job status changed from RUNNING to RESTARTING
>   Normal   JobStatusChanged  29m                 Job
> Job status changed from RESTARTING to CREATED
>   Normal   JobStatusChanged  28m                 Job
> Job status changed from CREATED to RESTARTING
>   Warning  Missing           26m                 JobManagerDeployment
> Missing JobManager deployment
>   Warning  RestoreFailed     9s (x106 over 26m)  JobManagerDeployment  HA
> metadata not available to restore from last state. It is possible that the
> job has finished or terminally failed, or the configmaps have been
> deleted. Manual restore required.
>   Normal   Submit            9s (x106 over 26m)  JobManagerDeployment
> Starting deployment


We're happy with the last state mode most of the time, but we face it
occasionally.

We found that it's not easy to reproduce the problem; we tried to kill JMs
and TMs and even shutdown the nodes on which JMs and TMs are running.

We also checked that the file size is not zero.

Thanks,

Dongwon

[Flink K8s operator] HA metadata not available to restore from last state

Reply via email to