Kumar Mallikarjuna created FLINK-38047: ------------------------------------------
Summary: Bump cert-manager in the Kubernetes Operator Key: FLINK-38047 URL: https://issues.apache.org/jira/browse/FLINK-38047 Project: Flink Issue Type: Technical Debt Components: Kubernetes Operator Reporter: Kumar Mallikarjuna Flink Kubernetes Operator currently use cert-manager:{_}v1.8.2{_} in the [CI|https://github.com/apache/flink-kubernetes-operator/blob/main/e2e-tests/cert-manager.yaml] and recommends the same in [docs|https://github.com/apache/flink-kubernetes-operator/blob/8812c78cd6a2c0ad1b672ca08a8b880bd890ae8b/docs/content/docs/try-flink-kubernetes-operator/quick-start.md?plain=1#L69-L72]. The latest stable release _v1.18.2_ is ten minor versions ahead. We should bump the recommendations and tests to the latest release. Validation for _cert-manager:v1.18.2_ with {_}flink-kubernetes-operator:v1.12.0{_}: 1. Start a kind cluster {code:java} ➜ flink-kubernetes-operator git:(main) ✗ kind create cluster Creating cluster "kind" ... ✓ Ensuring node image (kindest/node:v1.32.2) 🖼 ✓ Preparing nodes 📦 ✓ Writing configuration 📜 ✓ Starting control-plane 🕹️ ✓ Installing CNI 🔌 ✓ Installing StorageClass 💾 Set kubectl context to "kind-kind" You can now use your cluster with:kubectl cluster-info --context kind-kindHave a nice day! 👋 {code} 2. Install cert-manager v1.18.2 {code:java} ➜ flink-kubernetes-operator git:(main) ✗ kubectl create -f https://github.com/cert-manager/cert-manager/releases/download/v1.18.2/cert-manager.yaml namespace/cert-manager created customresourcedefinition.apiextensions.k8s.io/certificaterequests.cert-manager.io created customresourcedefinition.apiextensions.k8s.io/certificates.cert-manager.io created customresourcedefinition.apiextensions.k8s.io/challenges.acme.cert-manager.io created customresourcedefinition.apiextensions.k8s.io/clusterissuers.cert-manager.io created customresourcedefinition.apiextensions.k8s.io/issuers.cert-manager.io created customresourcedefinition.apiextensions.k8s.io/orders.acme.cert-manager.io created serviceaccount/cert-manager-cainjector created serviceaccount/cert-manager created serviceaccount/cert-manager-webhook created clusterrole.rbac.authorization.k8s.io/cert-manager-cainjector created clusterrole.rbac.authorization.k8s.io/cert-manager-controller-issuers created clusterrole.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers created clusterrole.rbac.authorization.k8s.io/cert-manager-controller-certificates created clusterrole.rbac.authorization.k8s.io/cert-manager-controller-orders created clusterrole.rbac.authorization.k8s.io/cert-manager-controller-challenges created clusterrole.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim created clusterrole.rbac.authorization.k8s.io/cert-manager-cluster-view created clusterrole.rbac.authorization.k8s.io/cert-manager-view created clusterrole.rbac.authorization.k8s.io/cert-manager-edit created clusterrole.rbac.authorization.k8s.io/cert-manager-controller-approve:cert-manager-io created clusterrole.rbac.authorization.k8s.io/cert-manager-controller-certificatesigningrequests created clusterrole.rbac.authorization.k8s.io/cert-manager-webhook:subjectaccessreviews created clusterrolebinding.rbac.authorization.k8s.io/cert-manager-cainjector created clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-issuers created clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers created clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-certificates created clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-orders created clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-challenges created clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim created clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-approve:cert-manager-io created clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-certificatesigningrequests created clusterrolebinding.rbac.authorization.k8s.io/cert-manager-webhook:subjectaccessreviews created role.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection created role.rbac.authorization.k8s.io/cert-manager:leaderelection created role.rbac.authorization.k8s.io/cert-manager-tokenrequest created role.rbac.authorization.k8s.io/cert-manager-webhook:dynamic-serving created rolebinding.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection created rolebinding.rbac.authorization.k8s.io/cert-manager:leaderelection created rolebinding.rbac.authorization.k8s.io/cert-manager-cert-manager-tokenrequest created rolebinding.rbac.authorization.k8s.io/cert-manager-webhook:dynamic-serving created service/cert-manager-cainjector created service/cert-manager created service/cert-manager-webhook created deployment.apps/cert-manager-cainjector created deployment.apps/cert-manager created deployment.apps/cert-manager-webhook created mutatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created validatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created {code} 3. Wait for cert-manager to be ready {code:java} ➜ flink-kubernetes-operator git:(main) ✗ k -n cert-manager get po NAME READY STATUS RESTARTS AGE cert-manager-69f748766f-28s8d 1/1 Running 0 44s cert-manager-cainjector-7cf6557c49-gdfd7 1/1 Running 0 44s cert-manager-webhook-58f4cff74d-kz4pc 1/1 Running 0 44s {code} 4. Install flink-kubernetes-operator {code:java} ➜ flink-kubernetes-operator git:(main) ✗ helm install flink-kubernetes-operator flink-operator-repo/flink-kubernetes-operator W0704 14:33:26.593488 51760 warnings.go:70] spec.privateKey.rotationPolicy: In cert-manager >= v1.18.0, the default value changed from `Never` to `Always`. NAME: flink-kubernetes-operator LAST DEPLOYED: Fri Jul 4 14:33:25 2025 NAMESPACE: default STATUS: deployed REVISION: 1 TEST SUITE: None{code} *Note:* The warning about `spec.privateKey.rotationPolicy` is expected and can be ignored since it does not affect the functionality of the operator/webhook. 5. Verify the operator/webhook are running {code:java} ➜ flink-kubernetes-operator git:(main) ✗ k get po NAME READY STATUS RESTARTS AGE flink-kubernetes-operator-7dc7858566-42g5z 2/2 Running 0 112s{code} 6. Test with a sample FlinkDeployment {code:java} ➜ flink-kubernetes-operator git:(main) ✗ k create -f examples/basic.yaml flinkdeployment.flink.apache.org/basic-example created ➜ flink-kubernetes-operator git:(main) ✗ k get flinkdeployments.flink.apache.org NAME JOB STATUS LIFECYCLE STATE basic-example RUNNING STABLE ➜ flink-kubernetes-operator git:(main) ✗ k get po NAME READY STATUS RESTARTS AGE basic-example-6c7bff5c68-w669x 1/1 Running 0 70s basic-example-taskmanager-1-1 1/1 Running 0 23s flink-kubernetes-operator-7dc7858566-42g5z 2/2 Running 0 3m27s{code} 7. Clean up the FlinkDeployment {code:java} ➜ flink-kubernetes-operator git:(main) ✗ k delete flinkdeployments.flink.apache.org basic-example flinkdeployment.flink.apache.org "basic-example" deleted {code} 8. Force rotate the certificate {code:java} ➜ flink-kubernetes-operator git:(main) ✗ k get certificate NAME READY SECRET AGE flink-operator-serving-cert True webhook-server-cert 4m48s ➜ flink-kubernetes-operator git:(main) ✗ k get certificate flink-operator-serving-cert -oyaml apiVersion: cert-manager.io/v1 kind: Certificate metadata: annotations: meta.helm.sh/release-name: flink-kubernetes-operator meta.helm.sh/release-namespace: default creationTimestamp: "2025-07-04T09:03:26Z" generation: 1 labels: app.kubernetes.io/managed-by: Helm name: flink-operator-serving-cert namespace: default resourceVersion: "997" uid: b0e1935c-eab8-4b61-ad9f-7bb0bf166c07 spec: commonName: FlinkDeployment Validator dnsNames: - flink-operator-webhook-service.default.svc - flink-operator-webhook-service.default.svc.cluster.local issuerRef: kind: Issuer name: flink-operator-selfsigned-issuer keystores: pkcs12: create: true passwordSecretRef: key: password name: flink-operator-webhook-secret secretName: webhook-server-cert status: conditions: - lastTransitionTime: "2025-07-04T09:03:26Z" message: Certificate is up to date and has not expired observedGeneration: 1 reason: Ready status: "True" type: Ready notAfter: "2025-10-02T09:03:26Z" notBefore: "2025-07-04T09:03:26Z" renewalTime: "2025-09-02T09:03:26Z" revision: 1 ➜ flink-kubernetes-operator git:(main) ✗ cmctl renew flink-operator-serving-cert Manually triggered issuance of Certificate default/flink-operator-serving-cert ➜ flink-kubernetes-operator git:(main) ✗ k get certificate flink-operator-serving-cert -oyaml apiVersion: cert-manager.io/v1 kind: Certificate metadata: annotations: meta.helm.sh/release-name: flink-kubernetes-operator meta.helm.sh/release-namespace: default creationTimestamp: "2025-07-04T09:03:26Z" generation: 1 labels: app.kubernetes.io/managed-by: Helm name: flink-operator-serving-cert namespace: default resourceVersion: "1591" uid: b0e1935c-eab8-4b61-ad9f-7bb0bf166c07 spec: commonName: FlinkDeployment Validator dnsNames: - flink-operator-webhook-service.default.svc - flink-operator-webhook-service.default.svc.cluster.local issuerRef: kind: Issuer name: flink-operator-selfsigned-issuer keystores: pkcs12: create: true passwordSecretRef: key: password name: flink-operator-webhook-secret secretName: webhook-server-cert status: conditions: - lastTransitionTime: "2025-07-04T09:03:26Z" message: Certificate is up to date and has not expired observedGeneration: 1 reason: Ready status: "True" type: Ready notAfter: "2025-10-02T09:08:37Z" notBefore: "2025-07-04T09:08:37Z" renewalTime: "2025-09-02T09:08:37Z" revision: 2 {code} 9. Verify the operator/webhook are still running {code:java} ➜ flink-kubernetes-operator git:(main) ✗ k get po NAME READY STATUS RESTARTS AGE flink-kubernetes-operator-7dc7858566-42g5z 2/2 Running 0 5m50s {code} 10. Check logs for the webhook and verify if the certificate was reloaded {code:java} ➜ flink-kubernetes-operator git:(main) ✗ k logs flink-kubernetes-operator-7dc7858566-42g5z -c flink-webhook | tail -20 2025-07-04 09:03:57,113 o.a.f.k.o.f.FileSystemWatchService [INFO ] Starting watching path: /certs 2025-07-04 09:03:57,117 o.a.f.k.o.f.FileSystemWatchService [INFO ] Path is resolved to real path: /certs 2025-07-04 09:03:57,186 o.a.f.k.o.a.FlinkOperatorWebhook [INFO ] Webhook listening at 0:0:0:0:0:0:0:0:9443 2025-07-04 09:08:47,807 o.a.f.k.o.a.FlinkOperatorWebhook [INFO ] Reloading SSL context because of certificate change 2025-07-04 09:08:47,809 o.a.f.k.o.s.ReloadableSslContext [INFO ] Creating keystore with type: pkcs12 2025-07-04 09:08:47,810 o.a.f.k.o.s.ReloadableSslContext [INFO ] Loading keystore from file: /certs/keystore.p12 2025-07-04 09:08:47,816 o.a.f.k.o.s.ReloadableSslContext [INFO ] Initializing key manager with keystore and password 2025-07-04 09:08:47,821 o.a.f.k.o.a.FlinkOperatorWebhook [INFO ] SSL context reloaded successfully 2025-07-04 09:08:56,977 o.a.f.c.GlobalConfiguration [INFO ] Using legacy YAML parser to load flink configuration file from /opt/flink/conf/flink-conf.yaml. 2025-07-04 09:08:56,982 o.a.f.c.GlobalConfiguration [INFO ] Loading configuration property: parallelism.default, 1 2025-07-04 09:08:56,982 o.a.f.c.GlobalConfiguration [INFO ] Loading configuration property: taskmanager.numberOfTaskSlots, 1 2025-07-04 09:08:56,982 o.a.f.c.GlobalConfiguration [INFO ] Loading configuration property: kubernetes.operator.default-configuration.flink-version.v1_18.env.java.opts.all, --add-exports=java.base/sun.net.util=ALL-UNNAMED --add-exports=java.rmi/sun.rmi.registry=ALL-UNNAMED --add-exports=java.security.jgss/sun.security.krb5=ALL-UNNAMED --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.text=ALL-UNNAMED --add-opens=java.base/java.time=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.locks=ALL-UNNAMED 2025-07-04 09:08:56,982 o.a.f.c.GlobalConfiguration [INFO ] Loading configuration property: kubernetes.operator.reconcile.interval, 15 s 2025-07-04 09:08:56,982 o.a.f.c.GlobalConfiguration [INFO ] Loading configuration property: kubernetes.operator.default-configuration.flink-version.v1_19+.env.java.default-opts.all, --add-exports=java.base/sun.net.util=ALL-UNNAMED --add-exports=java.rmi/sun.rmi.registry=ALL-UNNAMED --add-exports=java.security.jgss/sun.security.krb5=ALL-UNNAMED --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.text=ALL-UNNAMED --add-opens=java.base/java.time=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.locks=ALL-UNNAMED 2025-07-04 09:08:56,982 o.a.f.c.GlobalConfiguration [INFO ] Loading configuration property: kubernetes.operator.metrics.reporter.slf4j.interval, 5 MINUTE 2025-07-04 09:08:56,983 o.a.f.c.GlobalConfiguration [INFO ] Loading configuration property: kubernetes.operator.observer.progress-check.interval, 5 s 2025-07-04 09:08:56,983 o.a.f.c.GlobalConfiguration [INFO ] Loading configuration property: kubernetes.operator.health.probe.enabled, true 2025-07-04 09:08:56,983 o.a.f.c.GlobalConfiguration [INFO ] Loading configuration property: kubernetes.operator.health.probe.port, 8085 2025-07-04 09:08:56,983 o.a.f.c.GlobalConfiguration [INFO ] Loading configuration property: kubernetes.operator.metrics.reporter.slf4j.factory.class, org.apache.flink.metrics.slf4j.Slf4jReporterFactory 2025-07-04 09:08:56,984 o.a.f.k.o.c.FlinkConfigManager [INFO ] Default configuration did not change, nothing to do... {code} 11. Create a resource to test the webhook {code:java} ➜ flink-kubernetes-operator git:(main) ✗ k create -f examples/basic.yaml flinkdeployment.flink.apache.org/basic-example created {code} 12. Check the resource status {code:java} ➜ flink-kubernetes-operator git:(main) ✗ k get flinkdeployments.flink.apache.org NAME JOB STATUS LIFECYCLE STATE basic-example RUNNING STABLE ➜ flink-kubernetes-operator git:(main) ✗ k get po NAME READY STATUS RESTARTS AGE basic-example-6c7bff5c68-gmlh2 1/1 Running 0 25s basic-example-taskmanager-1-1 1/1 Running 0 14s flink-kubernetes-operator-7dc7858566-42g5z 2/2 Running 0 7m28s {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)