Hey Guyla , dev-team I deployed rc-2 with helm on AWS EKS with HA enabled (3 pods).
The operator watches 3 namespaces. I successfully deployed an application cluster(Flink 1.17) via pod template. I encountered the following errors 1. org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.KubernetesClientException","message":"Failure executing: GET at: https://172.20.0.1/api/v1/nodes. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. nodes is forbidden: User "system:serviceaccount:dev-0-flink-clusters:dev-0-xsight-flink-operator-sa" cannot list resource "nodes" in API group "" at the cluster scope." Seems like the role is correct. I comment in the following ticket: https://issues.apache.org/jira/browse/FLINK-32041 In addition, I noticed that kubernetes.rest-service.exposed.type was on NodePort, once I changed it to ClusterIP the above error disappeared. [1] Is there any chance it looks for kube.config file instead of reading the service account? 2. When the cluster is deleted, the idle pods (not leaders) repeatedly throw the following error : [2023-05-14T12:00:50,388][Error] {} [i.f.k.c.i.i.c.SharedProcessor]: apps/v1/namespaces/dev-0-flink-shadow-clusters/deployments failed invoking InformerEventSource{resourceClass: Deployment} event handler: Cannot receive event after a delete event received java.lang.IllegalStateException: Cannot receive event after a delete event received (enclosed stacktrace) In addition, I'm not sure whether it's an issue or not, but autoscaler configurations (per cluster) are not shown neither in Flink web UI nor in the response when calling /jobmanager/config. [1] https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#accessing-flinks-web-ui Thanks, Tamir ________________________________ From: Jim Busche <jbus...@us.ibm.com> Sent: Saturday, May 13, 2023 5:59 PM To: dev@flink.apache.org <dev@flink.apache.org>; Hao t Chang <htch...@us.ibm.com>; Anthony Garrard <garr...@uk.ibm.com> Subject: Re: [VOTE] Apache Flink Kubernetes Operator Release 1.5.0, release candidate #2 EXTERNAL EMAIL Hi Guyla, I was able to deploy rc-2 with helm on a kind cluster and it was able to deploy the sample. But I'm still struggling on OpenShift with rc-2. There's some kind of RBAC permission issue that I haven't been able to solve when it deploys the flinkdep or flinksessionjobs. oc get flinkdep NAME JOB STATUS LIFECYCLE STATE basic-example UPGRADING basic-session-deployment-only-example UPGRADING oc get flinksessionjobs NAME JOB STATUS LIFECYCLE STATE basic-session-job-only-example oc describe flinkdep basic-example … Status: Cluster Info: Error: {"type":"org.apache.flink.kubernetes.operator.exception.ReconciliationException","message":"org.apache.flink.client.deployment.ClusterDeploymentException: Could not create Kubernetes cluster \"basic-example\".","throwableList":[{"type":"org.apache.flink.client.deployment.ClusterDeploymentException","message":"Could not create Kubernetes cluster \"basic-example\"."},{"type":"org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.KubernetesClientException","message":"Failure executing: POST at: https://172.30.0.1/apis/apps/v1/namespaces/default/deployments. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. deployments.apps \"basic-example\" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , <nil>."}]} Job Manager Deployment Status: MISSING I haven't been able to spot why/what's different between 1.5 and 1.4 release (which still deploys fine.) Hoping someone has an idea of what might be wrong. Thanks, Jim Confidentiality: This communication and any attachments are intended for the above-named persons only and may be confidential and/or legally privileged. Any opinions expressed in this communication are not necessarily those of NICE Actimize. If this communication has come to you in error you must take no action based on it, nor must you copy or show it to anyone; please delete/destroy and inform the sender by e-mail immediately. Monitoring: NICE Actimize may monitor incoming and outgoing e-mails. Viruses: Although we have taken steps toward ensuring that this e-mail and attachments are free from any virus, we advise that in keeping with good computing practice the recipient should ensure they are actually virus free.
java.lang.IllegalStateException: Cannot receive event after a delete event received at io.javaoperatorsdk.operator.processing.event.ResourceState.markEventReceived(ResourceState.java:83) at io.javaoperatorsdk.operator.processing.event.EventProcessor.markEventReceived(EventProcessor.java:197) at io.javaoperatorsdk.operator.processing.event.EventProcessor.handleEventMarking(EventProcessor.java:186) at io.javaoperatorsdk.operator.processing.event.EventProcessor.handleEvent(EventProcessor.java:104) at io.javaoperatorsdk.operator.processing.event.source.informer.InformerEventSource.lambda$propagateEvent$3(InformerEventSource.java:200) at java.lang.Iterable.forEach(Unknown Source)[?:?] at io.javaoperatorsdk.operator.processing.event.source.informer.InformerEventSource.propagateEvent(InformerEventSource.java:190) at io.javaoperatorsdk.operator.processing.event.source.informer.InformerEventSource.onDelete(InformerEventSource.java:140) at io.javaoperatorsdk.operator.processing.event.source.informer.InformerEventSource.onDelete(InformerEventSource.java:67) at io.fabric8.kubernetes.client.informers.impl.cache.ProcessorListener$DeleteNotification.handle(ProcessorListener.java:122) at io.fabric8.kubernetes.client.informers.impl.cache.ProcessorListener.add(ProcessorListener.java:50) at io.fabric8.kubernetes.client.informers.impl.cache.SharedProcessor.lambda$distribute$0(SharedProcessor.java:87) at io.fabric8.kubernetes.client.informers.impl.cache.SharedProcessor.lambda$distribute$1(SharedProcessor.java:110) at io.fabric8.kubernetes.client.utils.internal.SerialExecutor.lambda$execute$0(SerialExecutor.java:58) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source)