Hi, We found that flink operator [0] sometimes cannot start jobmanager after upgrading FlinkDeployment. We need to recreate FlinkDeployment to fix the problem. Anyone has this issue?
The following is redacted log from flink operator. After status becomes MISSING, it keeps in MISSING status for at least 15 minutes. 2022-04-29 09:41:15,141 o.a.f.c.d.a.c.ApplicationClusterDeployer [INFO ][namespace/flink-deployment-name] Submitting application in 'Application Mode'. 2022-04-29 09:41:15,145 o.a.f.r.u.c.m.ProcessMemoryUtils [INFO ][namespace/flink-deployment-name] The derived from fraction jvm overhead memory (2.400gb (2576980416 bytes)) is greater than its max value 1024.000mb (1073741824 bytes), max value will be used instead 2022-04-29 09:41:15,146 o.a.f.r.u.c.m.ProcessMemoryUtils [INFO ][namespace/flink-deployment-name] The derived from fraction jvm overhead memory (5.200gb (5583457568 bytes)) is greater than its max value 1024.000mb (1073741824 bytes), max value will be used instead 2022-04-29 09:41:15,146 o.a.f.r.u.c.m.ProcessMemoryUtils [INFO ][namespace/flink-deployment-name] The derived from fraction network memory (5.050gb (5422396292 bytes)) is greater than its max value 4.000gb (4294967296 bytes), max value will be used instead 2022-04-29 09:41:15,237 o.a.f.k.u.KubernetesUtils [INFO ][namespace/flink-deployment-name] Kubernetes deployment requires a fixed port. Configuration high-availability.jobmanager.port will be set to 6123 2022-04-29 09:41:15,508 o.a.f.k.KubernetesClusterDescriptor [WARN ][namespace/flink-deployment-name] Please note that Flink client operations(e.g. cancel, list, stop, savepoint, etc.) won't work from outside the Kubernetes cluster since 'kubernetes.rest-service.exposed.type' has been set to ClusterIP. 2022-04-29 09:41:15,508 o.a.f.k.KubernetesClusterDescriptor [INFO ][namespace/flink-deployment-name] Create flink application cluster flink-deployment-name successfully, JobManager Web Interface: http://flink-deployment-name.namespace:8081 2022-04-29 09:41:15,510 o.a.f.k.o.s.FlinkService [INFO ][namespace/flink-deployment-name] Application cluster successfully deployed 2022-04-29 09:41:15,583 o.a.f.k.o.c.FlinkDeploymentController [INFO ][namespace/flink-deployment-name] Reconciliation successfully completed 2022-04-29 09:41:15,684 o.a.f.k.o.c.FlinkDeploymentController [INFO ][namespace/flink-deployment-name] Starting reconciliation 2022-04-29 09:41:15,686 o.a.f.k.o.o.JobObserver [INFO ][namespace/flink-deployment-name] Observing JobManager deployment. Previous status: DEPLOYING 2022-04-29 09:41:15,792 o.a.f.k.o.o.JobObserver [INFO ][namespace/flink-deployment-name] JobManager is being deployed 2022-04-29 09:41:15,792 o.a.f.k.o.c.FlinkDeploymentController [INFO ][namespace/flink-deployment-name] Reconciliation successfully completed 2022-04-29 09:41:20,795 o.a.f.k.o.c.FlinkDeploymentController [INFO ][namespace/flink-deployment-name] Starting reconciliation 2022-04-29 09:41:20,797 o.a.f.k.o.o.JobObserver [INFO ][namespace/flink-deployment-name] Observing JobManager deployment. Previous status: DEPLOYING 2022-04-29 09:41:20,896 o.a.f.k.o.o.JobObserver [INFO ][namespace/flink-deployment-name] JobManager is being deployed 2022-04-29 09:41:20,897 o.a.f.k.o.c.FlinkDeploymentController [INFO ][namespace/flink-deployment-name] Reconciliation successfully completed 2022-04-29 09:41:25,899 o.a.f.k.o.c.FlinkDeploymentController [INFO ][namespace/flink-deployment-name] Starting reconciliation 2022-04-29 09:41:25,901 o.a.f.k.o.o.JobObserver [INFO ][namespace/flink-deployment-name] Observing JobManager deployment. Previous status: DEPLOYING 2022-04-29 09:41:25,997 o.a.f.k.o.o.JobObserver [INFO ][namespace/flink-deployment-name] JobManager is being deployed 2022-04-29 09:41:25,998 o.a.f.k.o.c.FlinkDeploymentController [INFO ][namespace/flink-deployment-name] Reconciliation successfully completed 2022-04-29 09:41:29,518 o.a.f.k.o.c.FlinkDeploymentController [INFO ][namespace/flink-deployment-name] Starting reconciliation 2022-04-29 09:41:29,520 o.a.f.k.o.o.JobObserver [INFO ][namespace/flink-deployment-name] Observing JobManager deployment. Previous status: DEPLOYING 2022-04-29 09:41:30,631 o.a.f.k.o.o.JobObserver [INFO ][namespace/flink-deployment-name] JobManager is being deployed 2022-04-29 09:41:30,631 o.a.f.k.o.c.FlinkDeploymentController [INFO ][namespace/flink-deployment-name] Reconciliation successfully completed 2022-04-29 09:41:35,639 o.a.f.k.o.c.FlinkDeploymentController [INFO ][namespace/flink-deployment-name] Starting reconciliation 2022-04-29 09:41:35,640 o.a.f.k.o.o.JobObserver [INFO ][namespace/flink-deployment-name] Observing JobManager deployment. Previous status: DEPLOYING 2022-04-29 09:41:35,756 o.a.f.k.o.o.JobObserver [INFO ][namespace/flink-deployment-name] JobManager is being deployed 2022-04-29 09:41:35,756 o.a.f.k.o.c.FlinkDeploymentController [INFO ][namespace/flink-deployment-name] Reconciliation successfully completed 2022-04-29 09:41:40,759 o.a.f.k.o.c.FlinkDeploymentController [INFO ][namespace/flink-deployment-name] Starting reconciliation 2022-04-29 09:41:40,760 o.a.f.k.o.o.JobObserver [INFO ][namespace/flink-deployment-name] Observing JobManager deployment. Previous status: DEPLOYING 2022-04-29 09:41:40,864 o.a.f.k.o.o.JobObserver [INFO ][namespace/flink-deployment-name] JobManager is being deployed 2022-04-29 09:41:40,864 o.a.f.k.o.c.FlinkDeploymentController [INFO ][namespace/flink-deployment-name] Reconciliation successfully completed 2022-04-29 09:41:45,867 o.a.f.k.o.c.FlinkDeploymentController [INFO ][namespace/flink-deployment-name] Starting reconciliation 2022-04-29 09:41:45,868 o.a.f.k.o.o.JobObserver [INFO ][namespace/flink-deployment-name] Observing JobManager deployment. Previous status: DEPLOYING 2022-04-29 09:41:45,870 o.a.f.k.o.o.JobObserver [INFO ][namespace/flink-deployment-name] JobManager deployment port is ready, waiting for the Flink REST API... 2022-04-29 09:41:45,870 o.a.f.k.o.c.FlinkDeploymentController [INFO ][namespace/flink-deployment-name] Reconciliation successfully completed 2022-04-29 09:41:55,901 o.a.f.k.o.c.FlinkDeploymentController [INFO ][namespace/flink-deployment-name] Starting reconciliation 2022-04-29 09:41:55,902 o.a.f.k.o.o.JobObserver [INFO ][namespace/flink-deployment-name] Observing JobManager deployment. Previous status: DEPLOYED_NOT_READY 2022-04-29 09:41:55,902 o.a.f.k.o.o.JobObserver [INFO ][namespace/flink-deployment-name] JobManager deployment is ready 2022-04-29 09:41:55,902 o.a.f.k.o.o.JobObserver [INFO ][namespace/flink-deployment-name] Observing job status 2022-04-29 09:41:56,294 o.a.f.k.o.o.JobObserver [INFO ][namespace/flink-deployment-name] No job found on cluster yet 2022-04-29 09:41:56,294 o.a.f.k.o.c.FlinkDeploymentController [INFO ][namespace/flink-deployment-name] Reconciliation successfully completed 2022-04-29 09:41:58,443 o.a.f.k.o.c.FlinkDeploymentController [INFO ][namespace/flink-deployment-name] Starting reconciliation 2022-04-29 09:41:58,445 o.a.f.k.o.o.JobObserver [INFO ][namespace/flink-deployment-name] Observing job status 2022-04-29 09:42:10,489 o.a.f.k.o.o.JobObserver [ERROR][namespace/flink-deployment-name] Exception while listing jobs 2022-04-29 09:42:10,489 o.a.f.k.o.o.JobObserver [INFO ][namespace/flink-deployment-name] Observing JobManager deployment. Previous status: READY 2022-04-29 09:42:10,489 o.a.f.k.o.o.JobObserver [INFO ][namespace/flink-deployment-name] JobManager deployment does not exist 2022-04-29 09:42:10,490 o.a.f.k.o.c.FlinkDeploymentController [INFO ][namespace/flink-deployment-name] Reconciliation successfully completed 2022-04-29 09:42:25,521 o.a.f.k.o.c.FlinkDeploymentController [INFO ][namespace/flink-deployment-name] Starting reconciliation 2022-04-29 09:42:25,522 o.a.f.k.o.o.JobObserver [INFO ][namespace/flink-deployment-name] Observing JobManager deployment. Previous status: MISSING 2022-04-29 09:42:25,522 o.a.f.k.o.o.JobObserver [INFO ][namespace/flink-deployment-name] JobManager deployment does not exist 2022-04-29 09:42:25,522 o.a.f.k.o.c.FlinkDeploymentController [INFO ][namespace/flink-deployment-name] Reconciliation successfully completed 2022-04-29 09:42:40,526 o.a.f.k.o.c.FlinkDeploymentController [INFO ][namespace/flink-deployment-name] Starting reconciliation 2022-04-29 09:42:40,527 o.a.f.k.o.o.JobObserver [INFO ][namespace/flink-deployment-name] Observing JobManager deployment. Previous status: MISSING 2022-04-29 09:42:40,527 o.a.f.k.o.o.JobObserver [INFO ][namespace/flink-deployment-name] JobManager deployment does not exist 2022-04-29 09:42:40,527 o.a.f.k.o.c.FlinkDeploymentController [INFO ][namespace/flink-deployment-name] Reconciliation successfully completed ... 2022-04-29 10:00:55,862 o.a.f.k.o.c.FlinkDeploymentController [INFO ][namespace/flink-deployment-name] Starting reconciliation 2022-04-29 10:00:55,863 o.a.f.k.o.o.JobObserver [INFO ][namespace/flink-deployment-name] Observing JobManager deployment. Previous status: MISSING 2022-04-29 10:00:55,863 o.a.f.k.o.o.JobObserver [INFO ][namespace/flink-deployment-name] JobManager deployment does not exist 2022-04-29 10:00:55,863 o.a.f.k.o.c.FlinkDeploymentController [INFO ][namespace/flink-deployment-name] Reconciliation successfully completed [0] https://github.com/apache/flink-kubernetes-operator -- ChangZhuo Chen (陳昌倬) czchen@{czchen,debian}.org http://czchen.info/ Key fingerprint = BA04 346D C2E1 FE63 C790 8793 CC65 B0CD EC27 5D5B
signature.asc
Description: PGP signature