Hi,

We found that flink operator [0] sometimes cannot start jobmanager after
upgrading FlinkDeployment. We need to recreate FlinkDeployment to fix
the problem. Anyone has this issue?

The following is redacted log from flink operator. After status becomes
MISSING, it keeps in MISSING status for at least 15 minutes.


    2022-04-29 09:41:15,141 o.a.f.c.d.a.c.ApplicationClusterDeployer [INFO 
][namespace/flink-deployment-name] Submitting application in 'Application Mode'.
    2022-04-29 09:41:15,145 o.a.f.r.u.c.m.ProcessMemoryUtils [INFO 
][namespace/flink-deployment-name] The derived from fraction jvm overhead 
memory (2.400gb (2576980416 bytes)) is greater than its max value 1024.000mb 
(1073741824 bytes), max value will be used instead
    2022-04-29 09:41:15,146 o.a.f.r.u.c.m.ProcessMemoryUtils [INFO 
][namespace/flink-deployment-name] The derived from fraction jvm overhead 
memory (5.200gb (5583457568 bytes)) is greater than its max value 1024.000mb 
(1073741824 bytes), max value will be used instead
    2022-04-29 09:41:15,146 o.a.f.r.u.c.m.ProcessMemoryUtils [INFO 
][namespace/flink-deployment-name] The derived from fraction network memory 
(5.050gb (5422396292 bytes)) is greater than its max value 4.000gb (4294967296 
bytes), max value will be used instead
    2022-04-29 09:41:15,237 o.a.f.k.u.KubernetesUtils      [INFO 
][namespace/flink-deployment-name] Kubernetes deployment requires a fixed port. 
Configuration high-availability.jobmanager.port will be set to 6123
    2022-04-29 09:41:15,508 o.a.f.k.KubernetesClusterDescriptor [WARN 
][namespace/flink-deployment-name] Please note that Flink client 
operations(e.g. cancel, list, stop, savepoint, etc.) won't work from outside 
the Kubernetes cluster since 'kubernetes.rest-service.exposed.type' has been 
set to ClusterIP.
    2022-04-29 09:41:15,508 o.a.f.k.KubernetesClusterDescriptor [INFO 
][namespace/flink-deployment-name] Create flink application cluster 
flink-deployment-name successfully, JobManager Web Interface: 
http://flink-deployment-name.namespace:8081
    2022-04-29 09:41:15,510 o.a.f.k.o.s.FlinkService       [INFO 
][namespace/flink-deployment-name] Application cluster successfully deployed
    2022-04-29 09:41:15,583 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][namespace/flink-deployment-name] Reconciliation successfully completed
    2022-04-29 09:41:15,684 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][namespace/flink-deployment-name] Starting reconciliation
    2022-04-29 09:41:15,686 o.a.f.k.o.o.JobObserver        [INFO 
][namespace/flink-deployment-name] Observing JobManager deployment. Previous 
status: DEPLOYING
    2022-04-29 09:41:15,792 o.a.f.k.o.o.JobObserver        [INFO 
][namespace/flink-deployment-name] JobManager is being deployed
    2022-04-29 09:41:15,792 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][namespace/flink-deployment-name] Reconciliation successfully completed
    2022-04-29 09:41:20,795 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][namespace/flink-deployment-name] Starting reconciliation
    2022-04-29 09:41:20,797 o.a.f.k.o.o.JobObserver        [INFO 
][namespace/flink-deployment-name] Observing JobManager deployment. Previous 
status: DEPLOYING
    2022-04-29 09:41:20,896 o.a.f.k.o.o.JobObserver        [INFO 
][namespace/flink-deployment-name] JobManager is being deployed
    2022-04-29 09:41:20,897 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][namespace/flink-deployment-name] Reconciliation successfully completed
    2022-04-29 09:41:25,899 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][namespace/flink-deployment-name] Starting reconciliation
    2022-04-29 09:41:25,901 o.a.f.k.o.o.JobObserver        [INFO 
][namespace/flink-deployment-name] Observing JobManager deployment. Previous 
status: DEPLOYING
    2022-04-29 09:41:25,997 o.a.f.k.o.o.JobObserver        [INFO 
][namespace/flink-deployment-name] JobManager is being deployed
    2022-04-29 09:41:25,998 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][namespace/flink-deployment-name] Reconciliation successfully completed
    2022-04-29 09:41:29,518 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][namespace/flink-deployment-name] Starting reconciliation
    2022-04-29 09:41:29,520 o.a.f.k.o.o.JobObserver        [INFO 
][namespace/flink-deployment-name] Observing JobManager deployment. Previous 
status: DEPLOYING
    2022-04-29 09:41:30,631 o.a.f.k.o.o.JobObserver        [INFO 
][namespace/flink-deployment-name] JobManager is being deployed
    2022-04-29 09:41:30,631 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][namespace/flink-deployment-name] Reconciliation successfully completed
    2022-04-29 09:41:35,639 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][namespace/flink-deployment-name] Starting reconciliation
    2022-04-29 09:41:35,640 o.a.f.k.o.o.JobObserver        [INFO 
][namespace/flink-deployment-name] Observing JobManager deployment. Previous 
status: DEPLOYING
    2022-04-29 09:41:35,756 o.a.f.k.o.o.JobObserver        [INFO 
][namespace/flink-deployment-name] JobManager is being deployed
    2022-04-29 09:41:35,756 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][namespace/flink-deployment-name] Reconciliation successfully completed
    2022-04-29 09:41:40,759 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][namespace/flink-deployment-name] Starting reconciliation
    2022-04-29 09:41:40,760 o.a.f.k.o.o.JobObserver        [INFO 
][namespace/flink-deployment-name] Observing JobManager deployment. Previous 
status: DEPLOYING
    2022-04-29 09:41:40,864 o.a.f.k.o.o.JobObserver        [INFO 
][namespace/flink-deployment-name] JobManager is being deployed
    2022-04-29 09:41:40,864 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][namespace/flink-deployment-name] Reconciliation successfully completed
    2022-04-29 09:41:45,867 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][namespace/flink-deployment-name] Starting reconciliation
    2022-04-29 09:41:45,868 o.a.f.k.o.o.JobObserver        [INFO 
][namespace/flink-deployment-name] Observing JobManager deployment. Previous 
status: DEPLOYING
    2022-04-29 09:41:45,870 o.a.f.k.o.o.JobObserver        [INFO 
][namespace/flink-deployment-name] JobManager deployment port is ready, waiting 
for the Flink REST API...
    2022-04-29 09:41:45,870 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][namespace/flink-deployment-name] Reconciliation successfully completed
    2022-04-29 09:41:55,901 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][namespace/flink-deployment-name] Starting reconciliation
    2022-04-29 09:41:55,902 o.a.f.k.o.o.JobObserver        [INFO 
][namespace/flink-deployment-name] Observing JobManager deployment. Previous 
status: DEPLOYED_NOT_READY
    2022-04-29 09:41:55,902 o.a.f.k.o.o.JobObserver        [INFO 
][namespace/flink-deployment-name] JobManager deployment is ready
    2022-04-29 09:41:55,902 o.a.f.k.o.o.JobObserver        [INFO 
][namespace/flink-deployment-name] Observing job status
    2022-04-29 09:41:56,294 o.a.f.k.o.o.JobObserver        [INFO 
][namespace/flink-deployment-name] No job found on cluster yet
    2022-04-29 09:41:56,294 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][namespace/flink-deployment-name] Reconciliation successfully completed
    2022-04-29 09:41:58,443 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][namespace/flink-deployment-name] Starting reconciliation
    2022-04-29 09:41:58,445 o.a.f.k.o.o.JobObserver        [INFO 
][namespace/flink-deployment-name] Observing job status
    2022-04-29 09:42:10,489 o.a.f.k.o.o.JobObserver        
[ERROR][namespace/flink-deployment-name] Exception while listing jobs
    2022-04-29 09:42:10,489 o.a.f.k.o.o.JobObserver        [INFO 
][namespace/flink-deployment-name] Observing JobManager deployment. Previous 
status: READY
    2022-04-29 09:42:10,489 o.a.f.k.o.o.JobObserver        [INFO 
][namespace/flink-deployment-name] JobManager deployment does not exist
    2022-04-29 09:42:10,490 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][namespace/flink-deployment-name] Reconciliation successfully completed
    2022-04-29 09:42:25,521 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][namespace/flink-deployment-name] Starting reconciliation
    2022-04-29 09:42:25,522 o.a.f.k.o.o.JobObserver        [INFO 
][namespace/flink-deployment-name] Observing JobManager deployment. Previous 
status: MISSING
    2022-04-29 09:42:25,522 o.a.f.k.o.o.JobObserver        [INFO 
][namespace/flink-deployment-name] JobManager deployment does not exist
    2022-04-29 09:42:25,522 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][namespace/flink-deployment-name] Reconciliation successfully completed
    2022-04-29 09:42:40,526 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][namespace/flink-deployment-name] Starting reconciliation
    2022-04-29 09:42:40,527 o.a.f.k.o.o.JobObserver        [INFO 
][namespace/flink-deployment-name] Observing JobManager deployment. Previous 
status: MISSING
    2022-04-29 09:42:40,527 o.a.f.k.o.o.JobObserver        [INFO 
][namespace/flink-deployment-name] JobManager deployment does not exist
    2022-04-29 09:42:40,527 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][namespace/flink-deployment-name] Reconciliation successfully completed
    ...

    2022-04-29 10:00:55,862 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][namespace/flink-deployment-name] Starting reconciliation
    2022-04-29 10:00:55,863 o.a.f.k.o.o.JobObserver        [INFO 
][namespace/flink-deployment-name] Observing JobManager deployment. Previous 
status: MISSING
    2022-04-29 10:00:55,863 o.a.f.k.o.o.JobObserver        [INFO 
][namespace/flink-deployment-name] JobManager deployment does not exist
    2022-04-29 10:00:55,863 o.a.f.k.o.c.FlinkDeploymentController [INFO 
][namespace/flink-deployment-name] Reconciliation successfully completed


[0] https://github.com/apache/flink-kubernetes-operator


-- 
ChangZhuo Chen (陳昌倬) czchen@{czchen,debian}.org
http://czchen.info/
Key fingerprint = BA04 346D C2E1 FE63 C790  8793 CC65 B0CD EC27 5D5B

Attachment: signature.asc
Description: PGP signature

Reply via email to