Peter Vary created FLINK-30315: ---------------------------------- Summary: Add more information about image pull failures to the operator log Key: FLINK-30315 URL: https://issues.apache.org/jira/browse/FLINK-30315 Project: Flink Issue Type: Improvement Components: Kubernetes Operator Reporter: Peter Vary
When there is an image pull error, this is what we see in the operator log: {code:java} org.apache.flink.kubernetes.operator.exception.DeploymentFailedException: Back-off pulling image "flink:1.14" at org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.checkContainerBackoff(AbstractFlinkDeploymentObserver.java:194) at org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeJmDeployment(AbstractFlinkDeploymentObserver.java:150) at org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeInternal(AbstractFlinkDeploymentObserver.java:84) at org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeInternal(AbstractFlinkDeploymentObserver.java:55) at org.apache.flink.kubernetes.operator.observer.AbstractFlinkResourceObserver.observe(AbstractFlinkResourceObserver.java:56) at org.apache.flink.kubernetes.operator.observer.AbstractFlinkResourceObserver.observe(AbstractFlinkResourceObserver.java:32) at org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:113) at org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:54) at io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:136) at io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:94) at org.apache.flink.kubernetes.operator.metrics.OperatorJosdkMetrics.timeControllerExecution(OperatorJosdkMetrics.java:80) at io.javaoperatorsdk.operator.processing.Controller.reconcile(Controller.java:93) at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.reconcileExecution(ReconciliationDispatcher.java:130) at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleReconcile(ReconciliationDispatcher.java:110) at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleDispatch(ReconciliationDispatcher.java:81) at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleExecution(ReconciliationDispatcher.java:54) at io.javaoperatorsdk.operator.processing.event.EventProcessor$ReconcilerExecutor.run(EventProcessor.java:406) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source) {code} This is the information we have on kubernetes side: {code} Normal Scheduled 2m19s default-scheduler Successfully assigned default/quickstart-base-86787586cd-lb7j6 to minikube Warning Failed 20s kubelet Failed to pull image "flink:1.14": rpc error: code = Unknown desc = context deadline exceeded *Warning Failed 20s kubelet Error*: ErrImagePull Normal BackOff 19s kubelet Back-off pulling image "flink:1.14" *Warning Failed 19s kubelet Error*: ImagePullBackOff Normal Pulling 7s (x2 over 2m19s) kubelet Pulling image "flink:1.14" {code} It would be good to add the additional message (in this case {{Failed to pull image "flink:1.14": rpc error: code = Unknown desc = context deadline exceeded}}) to the message of the {{DeploymentFailedException}} for tracebility. -- This message was sent by Atlassian Jira (v8.20.10#820010)