[ 
https://issues.apache.org/jira/browse/FLINK-24315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17416500#comment-17416500
 ] 

Yangze Guo commented on FLINK-24315:
------------------------------------

Thanks for the information, I think we can leverage 
‘kubernetes.watch.reconnectInterval’ and ‘kubernetes.watch.reconnectLimit’ to 
implement the retry logic. However, the key point of this issue is that we need 
to handle the failure of rebuilding the watcher.

> Cannot rebuild watcher thread while the K8S API server is unavailable
> ---------------------------------------------------------------------
>
>                 Key: FLINK-24315
>                 URL: https://issues.apache.org/jira/browse/FLINK-24315
>             Project: Flink
>          Issue Type: Bug
>          Components: Deployment / Kubernetes
>    Affects Versions: 1.14.0, 1.13.2
>            Reporter: ouyangwulin
>            Priority: Major
>             Fix For: 1.13.3, 1.14.1
>
>
> In native k8s integration, Flink will try to rebuild the watcher thread if 
> the API server is temporarily unavailable. However, if the jitter is longer 
> than the web socket timeout, the rebuilding of the watcher will timeout and 
> Flink cannot handle the pod event correctly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to