Hey Guyla , dev-team

I deployed rc-2 with helm on AWS EKS with HA enabled (3 pods).

The operator watches 3 namespaces.

I successfully deployed an application cluster(Flink 1.17) via pod template. I 
encountered the following errors

  1.
org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.KubernetesClientException","message":"Failure
 executing: GET at: https://172.20.0.1/api/v1/nodes. Message: 
Forbidden!Configured service account doesn't have access. Service account may 
have been revoked. nodes is forbidden: User 
"system:serviceaccount:dev-0-flink-clusters:dev-0-xsight-flink-operator-sa" 
cannot list resource "nodes" in API group "" at the cluster scope."
Seems like the role is correct. I comment in the following ticket: 
https://issues.apache.org/jira/browse/FLINK-32041
In addition, I noticed that kubernetes.rest-service.exposed.type was on 
NodePort​, once I changed it to ClusterIP​ the above error disappeared. [1]

Is there any chance it looks for kube.config file instead of reading the 
service account?

  2.  When the cluster is deleted, the idle pods (not leaders) repeatedly throw 
the following error :
[2023-05-14T12:00:50,388][Error] {} [i.f.k.c.i.i.c.SharedProcessor]: 
apps/v1/namespaces/dev-0-flink-shadow-clusters/deployments failed invoking 
InformerEventSource{resourceClass: Deployment} event handler: Cannot receive 
event after a delete event received
java.lang.IllegalStateException: Cannot receive event after a delete event 
received (enclosed stacktrace)

In addition, I'm not sure whether it's an issue or not, but autoscaler 
configurations (per cluster) are not shown neither in Flink web UI nor in the 
response when calling /jobmanager/config.

[1] 
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#accessing-flinks-web-ui

Thanks,
Tamir
________________________________
From: Jim Busche <jbus...@us.ibm.com>
Sent: Saturday, May 13, 2023 5:59 PM
To: dev@flink.apache.org <dev@flink.apache.org>; Hao t Chang 
<htch...@us.ibm.com>; Anthony Garrard <garr...@uk.ibm.com>
Subject: Re: [VOTE] Apache Flink Kubernetes Operator Release 1.5.0, release 
candidate #2

EXTERNAL EMAIL



Hi Guyla,

I was able to deploy rc-2 with helm on a kind cluster and it was able to deploy 
the sample.  But I'm still struggling on OpenShift with rc-2.  There's some 
kind of RBAC permission issue that I haven't been able to solve when it deploys 
the flinkdep or flinksessionjobs.


oc get flinkdep

NAME                                    JOB STATUS   LIFECYCLE STATE

basic-example                                        UPGRADING

basic-session-deployment-only-example                UPGRADING



oc get flinksessionjobs

NAME                             JOB STATUS   LIFECYCLE STATE

basic-session-job-only-example


oc describe flinkdep basic-example
…

Status:

  Cluster Info:

  Error:                          
{"type":"org.apache.flink.kubernetes.operator.exception.ReconciliationException","message":"org.apache.flink.client.deployment.ClusterDeploymentException:
 Could not create Kubernetes cluster 
\"basic-example\".","throwableList":[{"type":"org.apache.flink.client.deployment.ClusterDeploymentException","message":"Could
 not create Kubernetes cluster 
\"basic-example\"."},{"type":"org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.KubernetesClientException","message":"Failure
 executing: POST at: 
https://172.30.0.1/apis/apps/v1/namespaces/default/deployments. Message: 
Forbidden!Configured service account doesn't have access. Service account may 
have been revoked. deployments.apps \"basic-example\" is forbidden: cannot set 
blockOwnerDeletion if an ownerReference refers to a resource you can't set 
finalizers on: , <nil>."}]}

  Job Manager Deployment Status:  MISSING

I haven't been able to spot why/what's different between 1.5 and 1.4 release 
(which still deploys fine.)
Hoping someone has an idea of what might be wrong.

Thanks, Jim

Confidentiality: This communication and any attachments are intended for the 
above-named persons only and may be confidential and/or legally privileged. Any 
opinions expressed in this communication are not necessarily those of NICE 
Actimize. If this communication has come to you in error you must take no 
action based on it, nor must you copy or show it to anyone; please 
delete/destroy and inform the sender by e-mail immediately.
Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
Viruses: Although we have taken steps toward ensuring that this e-mail and 
attachments are free from any virus, we advise that in keeping with good 
computing practice the recipient should ensure they are actually virus free.
java.lang.IllegalStateException: Cannot receive event after a delete event 
received
        at 
io.javaoperatorsdk.operator.processing.event.ResourceState.markEventReceived(ResourceState.java:83)
        at 
io.javaoperatorsdk.operator.processing.event.EventProcessor.markEventReceived(EventProcessor.java:197)
        at 
io.javaoperatorsdk.operator.processing.event.EventProcessor.handleEventMarking(EventProcessor.java:186)
        at 
io.javaoperatorsdk.operator.processing.event.EventProcessor.handleEvent(EventProcessor.java:104)
        at 
io.javaoperatorsdk.operator.processing.event.source.informer.InformerEventSource.lambda$propagateEvent$3(InformerEventSource.java:200)
        at java.lang.Iterable.forEach(Unknown Source)[?:?]
        at 
io.javaoperatorsdk.operator.processing.event.source.informer.InformerEventSource.propagateEvent(InformerEventSource.java:190)
        at 
io.javaoperatorsdk.operator.processing.event.source.informer.InformerEventSource.onDelete(InformerEventSource.java:140)
        at 
io.javaoperatorsdk.operator.processing.event.source.informer.InformerEventSource.onDelete(InformerEventSource.java:67)
        at 
io.fabric8.kubernetes.client.informers.impl.cache.ProcessorListener$DeleteNotification.handle(ProcessorListener.java:122)
        at 
io.fabric8.kubernetes.client.informers.impl.cache.ProcessorListener.add(ProcessorListener.java:50)
        at 
io.fabric8.kubernetes.client.informers.impl.cache.SharedProcessor.lambda$distribute$0(SharedProcessor.java:87)
        at 
io.fabric8.kubernetes.client.informers.impl.cache.SharedProcessor.lambda$distribute$1(SharedProcessor.java:110)
        at 
io.fabric8.kubernetes.client.utils.internal.SerialExecutor.lambda$execute$0(SerialExecutor.java:58)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

Reply via email to