yogeek opened a new issue, #535:
URL: https://github.com/apache/spark-kubernetes-operator/issues/535

   ## Description
   
   When deploying the spark-kubernetes-operator with `replicas > 1` and leader 
election enabled, the operator pods continuously restart due to liveness probe 
failures.
   
   ## Root Cause
   
   The Helm chart enables leader election configuration but does not grant the 
necessary RBAC permissions for `coordination.k8s.io/leases`. Without these 
permissions, the operator cannot create or manage the leader election lease, 
causing:
   
   1. Leader election to fail silently
   2. Multiple replicas attempting to reconcile the same resources 
simultaneously
   3. Informer becoming unhealthy
   4. Liveness probe returning HTTP 500
   5. Pod restarts
   
   ## Steps to Reproduce
   
   1. Deploy the Helm chart with:
   ```yaml
   operatorDeployment:
     replicas: 2
   
   operatorConfiguration:
     append: true
     spark-operator.properties: |
       spark.kubernetes.operator.leader.election.enabled=true
       spark.kubernetes.operator.leader.election.lease.name=spark-operator-lease
   ```
   
   2. Observe pods restarting frequently
   3. Check for lease in operator namespace: `kubectl get lease -n <namespace>` 
- no lease exists
   4. Check logs for errors like:
   ```
   Controller: sparkclusterreconciler, Event Source: 
ControllerResourceEventSource, Informer: UNHEALTHY
   ```
   
   ## Expected Behavior
   
   The operator should be able to create and manage leases for leader election 
when `spark.kubernetes.operator.leader.election.enabled=true`.
   
   ## Proposed Fix
   
   Add the following rule to the `operatorRbacRules` template in 
`helm/spark-kubernetes-operator/templates/operator-rbac.yaml`:
   
   ```yaml
   - apiGroups:
       - "coordination.k8s.io"
     resources:
       - leases
     verbs:
       - '*'
   ```
   
   ## Environment
   
   - Chart version: 1.5.0
   - Operator version: 0.7.0
   - Kubernetes: 1.28+


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to