Kumar Mallikarjuna created FLINK-38047:
------------------------------------------

             Summary: Bump cert-manager in the Kubernetes Operator
                 Key: FLINK-38047
                 URL: https://issues.apache.org/jira/browse/FLINK-38047
             Project: Flink
          Issue Type: Technical Debt
          Components: Kubernetes Operator
            Reporter: Kumar Mallikarjuna


Flink Kubernetes Operator currently use cert-manager:{_}v1.8.2{_} in the 
[CI|https://github.com/apache/flink-kubernetes-operator/blob/main/e2e-tests/cert-manager.yaml]
 and recommends the same in 
[docs|https://github.com/apache/flink-kubernetes-operator/blob/8812c78cd6a2c0ad1b672ca08a8b880bd890ae8b/docs/content/docs/try-flink-kubernetes-operator/quick-start.md?plain=1#L69-L72].
 The latest stable release _v1.18.2_ is ten minor versions ahead. We should 
bump the recommendations and tests to the latest release.

 

Validation for _cert-manager:v1.18.2_ with 
{_}flink-kubernetes-operator:v1.12.0{_}:

 

1. Start a kind cluster

 
{code:java}
➜  flink-kubernetes-operator git:(main) ✗ kind create cluster
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.32.2) 🖼
 ✓ Preparing nodes 📦
 ✓ Writing configuration 📜
 ✓ Starting control-plane 🕹️
 ✓ Installing CNI 🔌
 ✓ Installing StorageClass 💾
Set kubectl context to "kind-kind"
You can now use your cluster with:kubectl cluster-info --context kind-kindHave 
a nice day! 👋
{code}
 

2. Install cert-manager v1.18.2

 
{code:java}
➜  flink-kubernetes-operator git:(main) ✗ kubectl create -f 
https://github.com/cert-manager/cert-manager/releases/download/v1.18.2/cert-manager.yaml
namespace/cert-manager created
customresourcedefinition.apiextensions.k8s.io/certificaterequests.cert-manager.io
 created
customresourcedefinition.apiextensions.k8s.io/certificates.cert-manager.io 
created
customresourcedefinition.apiextensions.k8s.io/challenges.acme.cert-manager.io 
created
customresourcedefinition.apiextensions.k8s.io/clusterissuers.cert-manager.io 
created
customresourcedefinition.apiextensions.k8s.io/issuers.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/orders.acme.cert-manager.io 
created
serviceaccount/cert-manager-cainjector created
serviceaccount/cert-manager created
serviceaccount/cert-manager-webhook created
clusterrole.rbac.authorization.k8s.io/cert-manager-cainjector created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-issuers created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers 
created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-certificates 
created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-orders created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-challenges created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim 
created
clusterrole.rbac.authorization.k8s.io/cert-manager-cluster-view created
clusterrole.rbac.authorization.k8s.io/cert-manager-view created
clusterrole.rbac.authorization.k8s.io/cert-manager-edit created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-approve:cert-manager-io
 created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-certificatesigningrequests
 created
clusterrole.rbac.authorization.k8s.io/cert-manager-webhook:subjectaccessreviews 
created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-cainjector created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-issuers 
created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers
 created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-certificates
 created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-orders 
created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-challenges 
created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim
 created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-approve:cert-manager-io
 created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-certificatesigningrequests
 created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-webhook:subjectaccessreviews
 created
role.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection created
role.rbac.authorization.k8s.io/cert-manager:leaderelection created
role.rbac.authorization.k8s.io/cert-manager-tokenrequest created
role.rbac.authorization.k8s.io/cert-manager-webhook:dynamic-serving created
rolebinding.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection 
created
rolebinding.rbac.authorization.k8s.io/cert-manager:leaderelection created
rolebinding.rbac.authorization.k8s.io/cert-manager-cert-manager-tokenrequest 
created
rolebinding.rbac.authorization.k8s.io/cert-manager-webhook:dynamic-serving 
created
service/cert-manager-cainjector created
service/cert-manager created
service/cert-manager-webhook created
deployment.apps/cert-manager-cainjector created
deployment.apps/cert-manager created
deployment.apps/cert-manager-webhook created
mutatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook 
created
validatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook
 created
{code}
 

 

3. Wait for cert-manager to be ready

 
{code:java}
➜  flink-kubernetes-operator git:(main) ✗ k -n cert-manager get po
NAME                                       READY   STATUS    RESTARTS   AGE
cert-manager-69f748766f-28s8d              1/1     Running   0          44s
cert-manager-cainjector-7cf6557c49-gdfd7   1/1     Running   0          44s
cert-manager-webhook-58f4cff74d-kz4pc      1/1     Running   0          44s 
{code}
 


4. Install flink-kubernetes-operator

 
{code:java}
➜  flink-kubernetes-operator git:(main) ✗ helm install 
flink-kubernetes-operator flink-operator-repo/flink-kubernetes-operator
W0704 14:33:26.593488   51760 warnings.go:70] spec.privateKey.rotationPolicy: 
In cert-manager >= v1.18.0, the default value changed from `Never` to `Always`.
NAME: flink-kubernetes-operator
LAST DEPLOYED: Fri Jul  4 14:33:25 2025
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None{code}
 

*Note:* The warning about `spec.privateKey.rotationPolicy` is expected and can 
be ignored since it does not affect the functionality of the operator/webhook.

 

5. Verify the operator/webhook are running

 
{code:java}
➜  flink-kubernetes-operator git:(main) ✗ k get po
NAME                                         READY   STATUS    RESTARTS   AGE
flink-kubernetes-operator-7dc7858566-42g5z   2/2     Running   0          
112s{code}
 

 

6. Test with a sample FlinkDeployment

 
{code:java}
➜  flink-kubernetes-operator git:(main) ✗ k create -f examples/basic.yaml
flinkdeployment.flink.apache.org/basic-example created
 
➜  flink-kubernetes-operator git:(main) ✗ k get 
flinkdeployments.flink.apache.org
NAME            JOB STATUS   LIFECYCLE STATE
basic-example   RUNNING      STABLE

➜  flink-kubernetes-operator git:(main) ✗ k get po
NAME                                         READY   STATUS    RESTARTS   AGE
basic-example-6c7bff5c68-w669x               1/1     Running   0          70s
basic-example-taskmanager-1-1                1/1     Running   0          23s
flink-kubernetes-operator-7dc7858566-42g5z   2/2     Running   0          
3m27s{code}
 

 

7. Clean up the FlinkDeployment

 
{code:java}
➜  flink-kubernetes-operator git:(main) ✗ k delete 
flinkdeployments.flink.apache.org basic-example
flinkdeployment.flink.apache.org "basic-example" deleted {code}
 


8. Force rotate the certificate

 
{code:java}
➜  flink-kubernetes-operator git:(main) ✗ k get certificate
NAME                          READY   SECRET                AGE
flink-operator-serving-cert   True    webhook-server-cert   4m48s

➜  flink-kubernetes-operator git:(main) ✗ k get certificate 
flink-operator-serving-cert -oyaml
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  annotations:
    meta.helm.sh/release-name: flink-kubernetes-operator
    meta.helm.sh/release-namespace: default
  creationTimestamp: "2025-07-04T09:03:26Z"
  generation: 1
  labels:
    app.kubernetes.io/managed-by: Helm
  name: flink-operator-serving-cert
  namespace: default
  resourceVersion: "997"
  uid: b0e1935c-eab8-4b61-ad9f-7bb0bf166c07
spec:
  commonName: FlinkDeployment Validator
  dnsNames:
  - flink-operator-webhook-service.default.svc
  - flink-operator-webhook-service.default.svc.cluster.local
  issuerRef:
    kind: Issuer
    name: flink-operator-selfsigned-issuer
  keystores:
    pkcs12:
      create: true
      passwordSecretRef:
        key: password
        name: flink-operator-webhook-secret
  secretName: webhook-server-cert
status:
  conditions:
  - lastTransitionTime: "2025-07-04T09:03:26Z"
    message: Certificate is up to date and has not expired
    observedGeneration: 1
    reason: Ready
    status: "True"
    type: Ready
  notAfter: "2025-10-02T09:03:26Z"
  notBefore: "2025-07-04T09:03:26Z"
  renewalTime: "2025-09-02T09:03:26Z"
  revision: 1

➜  flink-kubernetes-operator git:(main) ✗ cmctl renew 
flink-operator-serving-cert
Manually triggered issuance of Certificate default/flink-operator-serving-cert

➜  flink-kubernetes-operator git:(main) ✗ k get certificate 
flink-operator-serving-cert -oyaml
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  annotations:
    meta.helm.sh/release-name: flink-kubernetes-operator
    meta.helm.sh/release-namespace: default
  creationTimestamp: "2025-07-04T09:03:26Z"
  generation: 1
  labels:
    app.kubernetes.io/managed-by: Helm
  name: flink-operator-serving-cert
  namespace: default
  resourceVersion: "1591"
  uid: b0e1935c-eab8-4b61-ad9f-7bb0bf166c07
spec:
  commonName: FlinkDeployment Validator
  dnsNames:
  - flink-operator-webhook-service.default.svc
  - flink-operator-webhook-service.default.svc.cluster.local
  issuerRef:
    kind: Issuer
    name: flink-operator-selfsigned-issuer
  keystores:
    pkcs12:
      create: true
      passwordSecretRef:
        key: password
        name: flink-operator-webhook-secret
  secretName: webhook-server-cert
status:
  conditions:
  - lastTransitionTime: "2025-07-04T09:03:26Z"
    message: Certificate is up to date and has not expired
    observedGeneration: 1
    reason: Ready
    status: "True"
    type: Ready
  notAfter: "2025-10-02T09:08:37Z"
  notBefore: "2025-07-04T09:08:37Z"
  renewalTime: "2025-09-02T09:08:37Z"
  revision: 2 {code}

9. Verify the operator/webhook are still running

 

 
{code:java}
➜  flink-kubernetes-operator git:(main) ✗ k get po
NAME                                         READY   STATUS    RESTARTS   AGE
flink-kubernetes-operator-7dc7858566-42g5z   2/2     Running   0          5m50s 
{code}

10. Check logs for the webhook and verify if the certificate was reloaded

 

 
{code:java}
➜  flink-kubernetes-operator git:(main) ✗ k logs 
flink-kubernetes-operator-7dc7858566-42g5z -c flink-webhook | tail -20
2025-07-04 09:03:57,113 o.a.f.k.o.f.FileSystemWatchService [INFO ] Starting 
watching path: /certs
2025-07-04 09:03:57,117 o.a.f.k.o.f.FileSystemWatchService [INFO ] Path is 
resolved to real path: /certs
2025-07-04 09:03:57,186 o.a.f.k.o.a.FlinkOperatorWebhook [INFO ] Webhook 
listening at 0:0:0:0:0:0:0:0:9443
2025-07-04 09:08:47,807 o.a.f.k.o.a.FlinkOperatorWebhook [INFO ] Reloading SSL 
context because of certificate change
2025-07-04 09:08:47,809 o.a.f.k.o.s.ReloadableSslContext [INFO ] Creating 
keystore with type: pkcs12
2025-07-04 09:08:47,810 o.a.f.k.o.s.ReloadableSslContext [INFO ] Loading 
keystore from file: /certs/keystore.p12
2025-07-04 09:08:47,816 o.a.f.k.o.s.ReloadableSslContext [INFO ] Initializing 
key manager with keystore and password
2025-07-04 09:08:47,821 o.a.f.k.o.a.FlinkOperatorWebhook [INFO ] SSL context 
reloaded successfully
2025-07-04 09:08:56,977 o.a.f.c.GlobalConfiguration    [INFO ] Using legacy 
YAML parser to load flink configuration file from 
/opt/flink/conf/flink-conf.yaml.
2025-07-04 09:08:56,982 o.a.f.c.GlobalConfiguration    [INFO ] Loading 
configuration property: parallelism.default, 1
2025-07-04 09:08:56,982 o.a.f.c.GlobalConfiguration    [INFO ] Loading 
configuration property: taskmanager.numberOfTaskSlots, 1
2025-07-04 09:08:56,982 o.a.f.c.GlobalConfiguration    [INFO ] Loading 
configuration property: 
kubernetes.operator.default-configuration.flink-version.v1_18.env.java.opts.all,
 --add-exports=java.base/sun.net.util=ALL-UNNAMED 
--add-exports=java.rmi/sun.rmi.registry=ALL-UNNAMED 
--add-exports=java.security.jgss/sun.security.krb5=ALL-UNNAMED 
--add-opens=java.base/java.lang=ALL-UNNAMED 
--add-opens=java.base/java.net=ALL-UNNAMED 
--add-opens=java.base/java.io=ALL-UNNAMED 
--add-opens=java.base/java.nio=ALL-UNNAMED 
--add-opens=java.base/sun.nio.ch=ALL-UNNAMED 
--add-opens=java.base/java.lang.reflect=ALL-UNNAMED 
--add-opens=java.base/java.text=ALL-UNNAMED 
--add-opens=java.base/java.time=ALL-UNNAMED 
--add-opens=java.base/java.util=ALL-UNNAMED 
--add-opens=java.base/java.util.concurrent=ALL-UNNAMED 
--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED 
--add-opens=java.base/java.util.concurrent.locks=ALL-UNNAMED
2025-07-04 09:08:56,982 o.a.f.c.GlobalConfiguration    [INFO ] Loading 
configuration property: kubernetes.operator.reconcile.interval, 15 s
2025-07-04 09:08:56,982 o.a.f.c.GlobalConfiguration    [INFO ] Loading 
configuration property: 
kubernetes.operator.default-configuration.flink-version.v1_19+.env.java.default-opts.all,
 --add-exports=java.base/sun.net.util=ALL-UNNAMED 
--add-exports=java.rmi/sun.rmi.registry=ALL-UNNAMED 
--add-exports=java.security.jgss/sun.security.krb5=ALL-UNNAMED 
--add-opens=java.base/java.lang=ALL-UNNAMED 
--add-opens=java.base/java.net=ALL-UNNAMED 
--add-opens=java.base/java.io=ALL-UNNAMED 
--add-opens=java.base/java.nio=ALL-UNNAMED 
--add-opens=java.base/sun.nio.ch=ALL-UNNAMED 
--add-opens=java.base/java.lang.reflect=ALL-UNNAMED 
--add-opens=java.base/java.text=ALL-UNNAMED 
--add-opens=java.base/java.time=ALL-UNNAMED 
--add-opens=java.base/java.util=ALL-UNNAMED 
--add-opens=java.base/java.util.concurrent=ALL-UNNAMED 
--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED 
--add-opens=java.base/java.util.concurrent.locks=ALL-UNNAMED
2025-07-04 09:08:56,982 o.a.f.c.GlobalConfiguration    [INFO ] Loading 
configuration property: kubernetes.operator.metrics.reporter.slf4j.interval, 5 
MINUTE
2025-07-04 09:08:56,983 o.a.f.c.GlobalConfiguration    [INFO ] Loading 
configuration property: kubernetes.operator.observer.progress-check.interval, 5 
s
2025-07-04 09:08:56,983 o.a.f.c.GlobalConfiguration    [INFO ] Loading 
configuration property: kubernetes.operator.health.probe.enabled, true
2025-07-04 09:08:56,983 o.a.f.c.GlobalConfiguration    [INFO ] Loading 
configuration property: kubernetes.operator.health.probe.port, 8085
2025-07-04 09:08:56,983 o.a.f.c.GlobalConfiguration    [INFO ] Loading 
configuration property: 
kubernetes.operator.metrics.reporter.slf4j.factory.class, 
org.apache.flink.metrics.slf4j.Slf4jReporterFactory
2025-07-04 09:08:56,984 o.a.f.k.o.c.FlinkConfigManager [INFO ] Default 
configuration did not change, nothing to do... {code}

11. Create a resource to test the webhook

 

 
{code:java}
➜  flink-kubernetes-operator git:(main) ✗ k create -f examples/basic.yaml
flinkdeployment.flink.apache.org/basic-example created {code}

12. Check the resource status

 

 
{code:java}
➜  flink-kubernetes-operator git:(main) ✗ k get 
flinkdeployments.flink.apache.org
NAME            JOB STATUS   LIFECYCLE STATE
basic-example   RUNNING      STABLE

➜  flink-kubernetes-operator git:(main) ✗ k get po
NAME                                         READY   STATUS    RESTARTS   AGE
basic-example-6c7bff5c68-gmlh2               1/1     Running   0          25s
basic-example-taskmanager-1-1                1/1     Running   0          14s
flink-kubernetes-operator-7dc7858566-42g5z   2/2     Running   0          7m28s 
{code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to