[ 
https://issues.apache.org/jira/browse/FLINK-36513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Fan updated FLINK-36513:
----------------------------
    Description: 
A lot of CI failures are caused by Install cert-manager, such as:

[https://github.com/apache/flink-kubernetes-operator/actions/runs/11292388626/job/31408603383]

[https://github.com/apache/flink-kubernetes-operator/actions/runs/11294831791/job/31436330397]
h1. Root cause:

I checked the raw log[1], the failure reason is : _Unable to connect to the 
server: dial tcp 140.82.113.3:443: i/o timeout._

!image-2024-10-12-10-30-38-781.png|width=1354,height=492!

 

CI code: 
[https://github.com/apache/flink-kubernetes-operator/blob/d2c01737c745979c6aadb670334565ee11aa2f4a/.github/workflows/ci.yml#L227]

 

It needs to download cert-manager.yaml from github, and 140.82.113.3:443 is the 
github ip+port. So download cert-manager.yaml is the root cause.

 
h1. Solution:
 * Solution1: Introducing retry mechanism
 ** Download cert-manager.yaml first with retry mechanism
 ** Then kubectl apply -f local file
 * Solution2: Put the cert-manager.yaml on the flink-kubernetes-operator repo 
directly
 ** I saw it's a fixed version, so the cert-manager.yaml is immutable IIUC.

 

[1] 
[https://github.com/apache/flink-kubernetes-operator/commit/d2c01737c745979c6aadb670334565ee11aa2f4a/checks/31436330397/logs]

 

  was:
A lot of CI failures are caused by Install cert-manager, such as:

[https://github.com/apache/flink-kubernetes-operator/actions/runs/11292388626/job/31408603383]

[https://github.com/apache/flink-kubernetes-operator/actions/runs/11294831791/job/31436330397]
h1. Root cause:

I checked the raw log[1], the failure reason is : _Unable to connect to the 
server: dial tcp 140.82.113.3:443: i/o timeout._

!image-2024-10-12-10-30-38-781.png|width=2702,height=982!

 

CI code: 
[https://github.com/apache/flink-kubernetes-operator/blob/d2c01737c745979c6aadb670334565ee11aa2f4a/.github/workflows/ci.yml#L227]

 

It needs to download cert-manager.yaml from github, and 140.82.113.3:443 is the 
github ip+port. So download cert-manager.yaml is the root cause.

 
h1. Solution:
 * Solution1: Introducing retry mechanism

 * 
 ** Download cert-manager.yaml first with retry mechanism
 ** Then kubectl apply -f local file
 * Solution2: Put the cert-manager.yaml on the flink-kubernetes-operator repo 
directly
 ** I saw it's a fixed version, so the cert-manager.yaml is immutable IIUC.

 

[1] 
https://github.com/apache/flink-kubernetes-operator/commit/d2c01737c745979c6aadb670334565ee11aa2f4a/checks/31436330397/logs

 


> A lot of CI failures are caused by Install cert-manager
> -------------------------------------------------------
>
>                 Key: FLINK-36513
>                 URL: https://issues.apache.org/jira/browse/FLINK-36513
>             Project: Flink
>          Issue Type: Technical Debt
>          Components: Kubernetes Operator
>    Affects Versions: kubernetes-operator-1.9.0
>            Reporter: Rui Fan
>            Assignee: Rui Fan
>            Priority: Major
>         Attachments: image-2024-10-12-10-30-38-781.png
>
>
> A lot of CI failures are caused by Install cert-manager, such as:
> [https://github.com/apache/flink-kubernetes-operator/actions/runs/11292388626/job/31408603383]
> [https://github.com/apache/flink-kubernetes-operator/actions/runs/11294831791/job/31436330397]
> h1. Root cause:
> I checked the raw log[1], the failure reason is : _Unable to connect to the 
> server: dial tcp 140.82.113.3:443: i/o timeout._
> !image-2024-10-12-10-30-38-781.png|width=1354,height=492!
>  
> CI code: 
> [https://github.com/apache/flink-kubernetes-operator/blob/d2c01737c745979c6aadb670334565ee11aa2f4a/.github/workflows/ci.yml#L227]
>  
> It needs to download cert-manager.yaml from github, and 140.82.113.3:443 is 
> the github ip+port. So download cert-manager.yaml is the root cause.
>  
> h1. Solution:
>  * Solution1: Introducing retry mechanism
>  ** Download cert-manager.yaml first with retry mechanism
>  ** Then kubectl apply -f local file
>  * Solution2: Put the cert-manager.yaml on the flink-kubernetes-operator repo 
> directly
>  ** I saw it's a fixed version, so the cert-manager.yaml is immutable IIUC.
>  
> [1] 
> [https://github.com/apache/flink-kubernetes-operator/commit/d2c01737c745979c6aadb670334565ee11aa2f4a/checks/31436330397/logs]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to