Hello This is my first setup with Apache Beam and trying to use the flink runner for it. Following the example of https://github.com/apache/flink-kubernetes-operator/tree/main/examples/flink-beam-example I got my app running inside my local minikube. The app is pretty stateless - minimal transformation between two IO.
Now, while I was testing things, my Mac decided to upgrade and thus minikube was also restarted. After the restart, it seems like the Flinkdeployment was still installed but no jobmanager or task managers pods were running and doing get flinkdeployment shows it to be: Job Status: Reconciling and Reconciliation: Deployed. Description of the job is as follows: Name: myapp-beam-example Namespace: default Labels: <none> Annotations: <none> API Version: flink.apache.org/v1beta1 Kind: FlinkDeployment Metadata: Creation Timestamp: 2023-02-17T05:38:33Z Finalizers: flinkdeployments.flink.apache.org/finalizer Generation: 3 Managed Fields: API Version: flink.apache.org/v1beta1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:finalizers: .: v:"flinkdeployments.flink.apache.org/finalizer": f:spec: f:job: f:state: f:jobManager: f:replicas: Manager: fabric8-kubernetes-client Operation: Update Time: 2023-02-17T05:38:33Z API Version: flink.apache.org/v1beta1 Fields Type: FieldsV1 fieldsV1: f:spec: .: f:flinkConfiguration: .: f:taskmanager.numberOfTaskSlots: f:flinkVersion: f:image: f:job: .: f:args: f:entryClass: f:jarURI: f:upgradeMode: f:jobManager: .: f:resource: .: f:cpu: f:memory: f:serviceAccount: f:taskManager: .: f:resource: .: f:cpu: f:memory: Manager: kubectl-create Operation: Update Time: 2023-02-17T05:38:33Z API Version: flink.apache.org/v1beta1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:annotations: .: f:kubectl.kubernetes.io/last-applied-configuration: f:spec: f:job: f:parallelism: Manager: kubectl-client-side-apply Operation: Update Time: 2023-02-17T05:45:11Z API Version: flink.apache.org/v1beta1 Fields Type: FieldsV1 fieldsV1: f:status: .: f:clusterInfo: .: f:flink-revision: f:flink-version: f:jobManagerDeploymentStatus: f:jobStatus: .: f:jobId: f:jobName: f:savepointInfo: .: f:lastPeriodicSavepointTimestamp: f:savepointHistory: f:startTime: f:state: f:updateTime: f:reconciliationStatus: .: f:lastReconciledSpec: f:lastStableSpec: f:reconciliationTimestamp: f:state: f:taskManager: .: f:labelSelector: f:replicas: Manager: fabric8-kubernetes-client Operation: Update Subresource: status Time: 2023-02-18T10:15:42Z Resource Version: 192678 UID: 71b851e5-4b99-434a-9027-d528eb8b1180 Spec: Flink Configuration: taskmanager.numberOfTaskSlots: 1 Flink Version: v1_16 Image: myapp-beam:latest Job: Args: --runner=FlinkRunner --bootstrapServers=kafka-service:9092 Entry Class: com.netskope.myapp.myapp Jar URI: local:///opt/flink/usrlib/beam-runner.jar Parallelism: 2 State: running Upgrade Mode: stateless Job Manager: Replicas: 1 Resource: Cpu: 1 Memory: 2048m Service Account: flink Task Manager: Resource: Cpu: 1 Memory: 2048m Status: Cluster Info: Flink - Revision: DeadD0d0 @ 1970-01-01T01:00:00+01:00 Flink - Version: 1.16.1 Job Manager Deployment Status: MISSING Job Status: Job Id: adb5186e96f6d36637f919fb5d9aad70 Job Name: myapp-flink-0217055545-790af232 Savepoint Info: Last Periodic Savepoint Timestamp: 0 Savepoint History: Start Time: 1676613348905 State: RECONCILING Update Time: 1676628937180 Reconciliation Status: Last Reconciled Spec: {"spec":{"job":{"jarURI":"local:///opt/flink/usrlib/beam-runner.jar","parallelism":2,"entryClass":"com.netskope.myapp.myapp","args":["--runner=FlinkRunner","--bootstrapServers=kafka-service:9092"],"state":"running","savepointTriggerNonce":null,"initialSavepointPath":null,"upgradeMode":"stateless","allowNonRestoredState":null},"restartNonce":null,"flinkConfiguration":{"taskmanager.numberOfTaskSlots":"1"},"image":"myapp-beam:latest","imagePullPolicy":null,"serviceAccount":"flink","flinkVersion":"v1_16","ingress":null,"podTemplate":null,"jobManager":{"resource":{"cpu":1.0,"memory":"2048m"},"replicas":1,"podTemplate":null},"taskManager":{"resource":{"cpu":1.0,"memory":"2048m"},"replicas":null,"podTemplate":null},"logConfiguration":null,"mode":null},"resource_metadata":{"apiVersion":" flink.apache.org/v1beta1 ","metadata":{"generation":3},"firstDeployment":false}} Last Stable Spec: {"spec":{"job":{"jarURI":"local:///opt/flink/usrlib/beam-runner.jar","parallelism":2,"entryClass":"com.netskope.myapp.myapp","args":["--runner=FlinkRunner","--bootstrapServers=kafka-service:9092"],"state":"running","savepointTriggerNonce":null,"initialSavepointPath":null,"upgradeMode":"stateless","allowNonRestoredState":null},"restartNonce":null,"flinkConfiguration":{"taskmanager.numberOfTaskSlots":"1"},"image":"myapp-beam:latest","imagePullPolicy":null,"serviceAccount":"flink","flinkVersion":"v1_16","ingress":null,"podTemplate":null,"jobManager":{"resource":{"cpu":1.0,"memory":"2048m"},"replicas":1,"podTemplate":null},"taskManager":{"resource":{"cpu":1.0,"memory":"2048m"},"replicas":null,"podTemplate":null},"logConfiguration":null,"mode":null},"resource_metadata":{"apiVersion":" flink.apache.org/v1beta1 ","metadata":{"generation":3},"firstDeployment":false}} Reconciliation Timestamp: 1676612715537 State: DEPLOYED Task Manager: Label Selector: component=taskmanager,app=myapp-beam-example Replicas: 2 Events: <none> tried to search for this on Github and in the mailing list archives, without luck. Just trying to understand what this means and how will it affect things if some failures like this happen in production.. If somebody can point me to the relevant document about this, that would be helpful too. -- Ritesh