Hey Guyla , dev-team
I deployed rc-2 with helm on AWS EKS with HA enabled (3 pods).
The operator watches 3 namespaces.
I successfully deployed an application cluster(Flink 1.17) via pod template. I
encountered the following errors
1.
org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.KubernetesClientException","message":"Failure
executing: GET at: https://172.20.0.1/api/v1/nodes. Message:
Forbidden!Configured service account doesn't have access. Service account may
have been revoked. nodes is forbidden: User
"system:serviceaccount:dev-0-flink-clusters:dev-0-xsight-flink-operator-sa"
cannot list resource "nodes" in API group "" at the cluster scope."
Seems like the role is correct. I comment in the following ticket:
https://issues.apache.org/jira/browse/FLINK-32041
In addition, I noticed that kubernetes.rest-service.exposed.type was on
NodePort, once I changed it to ClusterIP the above error disappeared. [1]
Is there any chance it looks for kube.config file instead of reading the
service account?
2. When the cluster is deleted, the idle pods (not leaders) repeatedly throw
the following error :
[2023-05-14T12:00:50,388][Error] {} [i.f.k.c.i.i.c.SharedProcessor]:
apps/v1/namespaces/dev-0-flink-shadow-clusters/deployments failed invoking
InformerEventSource{resourceClass: Deployment} event handler: Cannot receive
event after a delete event received
java.lang.IllegalStateException: Cannot receive event after a delete event
received (enclosed stacktrace)
In addition, I'm not sure whether it's an issue or not, but autoscaler
configurations (per cluster) are not shown neither in Flink web UI nor in the
response when calling /jobmanager/config.
[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#accessing-flinks-web-ui
Thanks,
Tamir
________________________________
From: Jim Busche <[email protected]>
Sent: Saturday, May 13, 2023 5:59 PM
To: [email protected] <[email protected]>; Hao t Chang
<[email protected]>; Anthony Garrard <[email protected]>
Subject: Re: [VOTE] Apache Flink Kubernetes Operator Release 1.5.0, release
candidate #2
EXTERNAL EMAIL
Hi Guyla,
I was able to deploy rc-2 with helm on a kind cluster and it was able to deploy
the sample. But I'm still struggling on OpenShift with rc-2. There's some
kind of RBAC permission issue that I haven't been able to solve when it deploys
the flinkdep or flinksessionjobs.
oc get flinkdep
NAME JOB STATUS LIFECYCLE STATE
basic-example UPGRADING
basic-session-deployment-only-example UPGRADING
oc get flinksessionjobs
NAME JOB STATUS LIFECYCLE STATE
basic-session-job-only-example
oc describe flinkdep basic-example
…
Status:
Cluster Info:
Error:
{"type":"org.apache.flink.kubernetes.operator.exception.ReconciliationException","message":"org.apache.flink.client.deployment.ClusterDeploymentException:
Could not create Kubernetes cluster
\"basic-example\".","throwableList":[{"type":"org.apache.flink.client.deployment.ClusterDeploymentException","message":"Could
not create Kubernetes cluster
\"basic-example\"."},{"type":"org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.KubernetesClientException","message":"Failure
executing: POST at:
https://172.30.0.1/apis/apps/v1/namespaces/default/deployments. Message:
Forbidden!Configured service account doesn't have access. Service account may
have been revoked. deployments.apps \"basic-example\" is forbidden: cannot set
blockOwnerDeletion if an ownerReference refers to a resource you can't set
finalizers on: , <nil>."}]}
Job Manager Deployment Status: MISSING
I haven't been able to spot why/what's different between 1.5 and 1.4 release
(which still deploys fine.)
Hoping someone has an idea of what might be wrong.
Thanks, Jim
Confidentiality: This communication and any attachments are intended for the
above-named persons only and may be confidential and/or legally privileged. Any
opinions expressed in this communication are not necessarily those of NICE
Actimize. If this communication has come to you in error you must take no
action based on it, nor must you copy or show it to anyone; please
delete/destroy and inform the sender by e-mail immediately.
Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
Viruses: Although we have taken steps toward ensuring that this e-mail and
attachments are free from any virus, we advise that in keeping with good
computing practice the recipient should ensure they are actually virus free.
java.lang.IllegalStateException: Cannot receive event after a delete event
received
at
io.javaoperatorsdk.operator.processing.event.ResourceState.markEventReceived(ResourceState.java:83)
at
io.javaoperatorsdk.operator.processing.event.EventProcessor.markEventReceived(EventProcessor.java:197)
at
io.javaoperatorsdk.operator.processing.event.EventProcessor.handleEventMarking(EventProcessor.java:186)
at
io.javaoperatorsdk.operator.processing.event.EventProcessor.handleEvent(EventProcessor.java:104)
at
io.javaoperatorsdk.operator.processing.event.source.informer.InformerEventSource.lambda$propagateEvent$3(InformerEventSource.java:200)
at java.lang.Iterable.forEach(Unknown Source)[?:?]
at
io.javaoperatorsdk.operator.processing.event.source.informer.InformerEventSource.propagateEvent(InformerEventSource.java:190)
at
io.javaoperatorsdk.operator.processing.event.source.informer.InformerEventSource.onDelete(InformerEventSource.java:140)
at
io.javaoperatorsdk.operator.processing.event.source.informer.InformerEventSource.onDelete(InformerEventSource.java:67)
at
io.fabric8.kubernetes.client.informers.impl.cache.ProcessorListener$DeleteNotification.handle(ProcessorListener.java:122)
at
io.fabric8.kubernetes.client.informers.impl.cache.ProcessorListener.add(ProcessorListener.java:50)
at
io.fabric8.kubernetes.client.informers.impl.cache.SharedProcessor.lambda$distribute$0(SharedProcessor.java:87)
at
io.fabric8.kubernetes.client.informers.impl.cache.SharedProcessor.lambda$distribute$1(SharedProcessor.java:110)
at
io.fabric8.kubernetes.client.utils.internal.SerialExecutor.lambda$execute$0(SerialExecutor.java:58)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)