[ 
https://issues.apache.org/jira/browse/FLINK-28171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17710375#comment-17710375
 ] 

Sergio Sainz edited comment on FLINK-28171 at 4/11/23 2:42 AM:
---------------------------------------------------------------

Hello [~wangyang0918] ! also experiencing this issue with native kubernetes 
deployment with HA enabled (not in flink k8s operator).

In [https://lists.apache.org/thread/yl40s9069wksz66qlf9t6jhmwsn59zft] you 
mentioned that "If HA enabled, the internal jobmanager rpc service will not be 
created. Instead, the TaskManager retrieves the JobManager address via HA 
services and connects to it via pod ip."

Do you know whether we can change the way TaskManager connects to HA services: 
do not use ip address and instead use pod name?

I think we cannot add the akka workaround of bypassing the istio sidecar. 
Thanks for the info ~

 


was (Author: sergiosp):
Hello [~wangyang0918] ! also experiencing this issue with native kubernetes 
deployment with HA enabled (not in flink k8s operator).

In [https://lists.apache.org/thread/yl40s9069wksz66qlf9t6jhmwsn59zft] you 
mentioned that "If HA enabled, the internal jobmanager rpc service will not be 
created. Instead, the TaskManager retrieves the JobManager address via HA 
services and connects to it via pod ip."

Do you know whether we can change the way TaskManager connects to HA services 
(do not use ip address and instead use service name?

I think we cannot add the akka workaround of bypassing the istio sidecar. 
Thanks for the info ~

 

> Adjust Job and Task manager port definitions to work with Istio+mTLS
> --------------------------------------------------------------------
>
>                 Key: FLINK-28171
>                 URL: https://issues.apache.org/jira/browse/FLINK-28171
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / Kubernetes
>    Affects Versions: 1.14.4
>         Environment: flink-kubernetes-operator 1.0.0
> Flink 1.14-java11
> Kubernetes v1.19.5
> Istio 1.7.6
>            Reporter: Moshe Elisha
>            Assignee: Moshe Elisha
>            Priority: Major
>              Labels: pull-request-available
>
> Hello,
>  
> We are launching Flink deployments using the [Flink Kubernetes 
> Operator|https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-stable/]
>  on a Kubernetes cluster with Istio and mTLS enabled.
>  
> We found that the TaskManager is unable to communicate with the JobManager on 
> the jobmanager-rpc port:
>  
> {{2022-06-15 15:25:40,508 WARN  akka.remote.ReliableDeliverySupervisor        
>                [] - Association with remote system 
> [akka.tcp://[flink@amf-events-to-inference-and-central.nwdaf-edge|mailto:flink@amf-events-to-inference-and-central.nwdaf-edge]:6123]
>  has failed, address is now gated for [50] ms. Reason: [Association failed 
> with 
> [akka.tcp://[flink@amf-events-to-inference-and-central.nwdaf-edge|mailto:flink@amf-events-to-inference-and-central.nwdaf-edge]:6123]]
>  Caused by: [The remote system explicitly disassociated (reason unknown).]}}
>  
> The reason for the issue is that the JobManager service port definitions are 
> not following the Istio guidelines 
> [https://istio.io/latest/docs/ops/configuration/traffic-management/protocol-selection/]
>  (see example below).
>  
> There was also an email discussion around this topic in the users mailing 
> group under the subject "Flink Kubernetes Operator with K8S + Istio + mTLS - 
> port definitions".
> With the help of the community, we were able to work around the issue but it 
> was very hard and forced us to skip Istio proxy which is not ideal.
>  
> We would like you to consider changing the default port definitions, either
>  # Rename the ports – I understand it is Istio specific guideline but maybe 
> it is better to at least be aligned with one (popular) vendor guideline 
> instead of none at all.
>  # Add the “appProtocol” property[1] that is not specific to any vendor but 
> requires Kubernetes >= 1.19 where it was introduced as beta and moved to 
> stable in >= 1.20. The option to add appProtocol property was added only in 
> [https://github.com/fabric8io/kubernetes-client/releases/tag/v5.10.0] with 
> [#3570|https://github.com/fabric8io/kubernetes-client/issues/3570].
>  # Or allow a way to override the defaults.
>  
> [https://kubernetes.io/docs/concepts/services-networking/_print/#application-protocol]
>  
>  
> {{# k get service inference-results-to-analytics-engine -o yaml}}
> {{apiVersion: v1}}
> {{kind: Service}}
> {{...}}
> {{spec:}}
> {{  clusterIP: None}}
> {{  ports:}}
> {{  - name: jobmanager-rpc *# should start with “tcp-“ or add "appProtocol" 
> property*}}
> {{    port: 6123}}
> {{    protocol: TCP}}
> {{    targetPort: 6123}}
> {{  - name: blobserver *# should start with "tcp-" or add "appProtocol" 
> property*}}
> {{    port: 6124}}
> {{    protocol: TCP}}
> {{    targetPort: 6124}}
> {{...}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to