Rui Fan created FLINK-36513:
-------------------------------

             Summary: A lot of CI failures are caused by Install cert-manager
                 Key: FLINK-36513
                 URL: https://issues.apache.org/jira/browse/FLINK-36513
             Project: Flink
          Issue Type: Technical Debt
          Components: Kubernetes Operator
    Affects Versions: kubernetes-operator-1.9.0
            Reporter: Rui Fan
            Assignee: Rui Fan
         Attachments: image-2024-10-12-10-30-38-781.png

A lot of CI failures are caused by Install cert-manager, such as:

[https://github.com/apache/flink-kubernetes-operator/actions/runs/11292388626/job/31408603383]

[https://github.com/apache/flink-kubernetes-operator/actions/runs/11294831791/job/31436330397]
h1. Root cause:

I checked the raw log[1], the failure reason is : _Unable to connect to the 
server: dial tcp 140.82.113.3:443: i/o timeout._

!image-2024-10-12-10-30-38-781.png|width=2702,height=982!

 

CI code: 
[https://github.com/apache/flink-kubernetes-operator/blob/d2c01737c745979c6aadb670334565ee11aa2f4a/.github/workflows/ci.yml#L227]

 

It needs to download cert-manager.yaml from github, and 140.82.113.3:443 is the 
github ip+port. So download cert-manager.yaml is the root cause.

 
h1. Solution:
 * Solution1: Introducing retry mechanism

 ** Download cert-manager.yaml first with retry mechanism
 ** Then kubectl apply -f local file
 * Solution2: Put the cert-manager.yaml on the flink-kubernetes-operator repo 
directly
 ** I saw it's a fixed version, so the cert-manager.yaml is immutable IIUC.

 

[1] 
https://productionresultssa9.blob.core.windows.net/actions-results/1c3ad627-91b6-4db3-a9ad-453109617470/workflow-job-run-3290c02c-bc49-582e-d1d1-63220f3fe3ce/logs/job/job-logs.txt?rsct=text%2Fplain&se=2024-10-12T02%3A29%3A01Z&sig=InSwCX86huA086rqGjAXM836sM8%2Bb8zk5%2FfeVJgmpsM%3D&ske=2024-10-12T13%3A37%3A00Z&skoid=ca7593d4-ee42-46cd-af88-8b886a2f84eb&sks=b&skt=2024-10-12T01%3A37%3A00Z&sktid=398a6654-997b-47e9-b12b-9515b896b4de&skv=2024-08-04&sp=r&spr=https&sr=b&st=2024-10-12T02%3A18%3A56Z&sv=2024-08-04

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to