[ https://issues.apache.org/jira/browse/FLINK-28171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17558329#comment-17558329 ]
Moshe Elisha commented on FLINK-28171: -------------------------------------- Thanks [~martijnvisser] and [~wangyang0918] for sharing your view. I assume that if we simply add `appProtocol ` and Kubernetes is < 1.19 will result in ValidationError "unknown field". That said, I believe we can use the Kubernetes client in flink-kubernetes to check the Kubernetes server version and add `appProtocol` only if >= 1.19. > Adjust Job and Task manager port definitions to work with Istio+mTLS > -------------------------------------------------------------------- > > Key: FLINK-28171 > URL: https://issues.apache.org/jira/browse/FLINK-28171 > Project: Flink > Issue Type: Improvement > Components: Deployment / Kubernetes > Affects Versions: 1.14.4 > Environment: flink-kubernetes-operator 1.0.0 > Flink 1.14-java11 > Kubernetes v1.19.5 > Istio 1.7.6 > Reporter: Moshe Elisha > Priority: Major > > Hello, > > We are launching Flink deployments using the [Flink Kubernetes > Operator|https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-stable/] > on a Kubernetes cluster with Istio and mTLS enabled. > > We found that the TaskManager is unable to communicate with the JobManager on > the jobmanager-rpc port: > > {{2022-06-15 15:25:40,508 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://[flink@amf-events-to-inference-and-central.nwdaf-edge|mailto:flink@amf-events-to-inference-and-central.nwdaf-edge]:6123] > has failed, address is now gated for [50] ms. Reason: [Association failed > with > [akka.tcp://[flink@amf-events-to-inference-and-central.nwdaf-edge|mailto:flink@amf-events-to-inference-and-central.nwdaf-edge]:6123]] > Caused by: [The remote system explicitly disassociated (reason unknown).]}} > > The reason for the issue is that the JobManager service port definitions are > not following the Istio guidelines > [https://istio.io/latest/docs/ops/configuration/traffic-management/protocol-selection/] > (see example below). > > There was also an email discussion around this topic in the users mailing > group under the subject "Flink Kubernetes Operator with K8S + Istio + mTLS - > port definitions". > With the help of the community, we were able to work around the issue but it > was very hard and forced us to skip Istio proxy which is not ideal. > > We would like you to consider changing the default port definitions, either > # Rename the ports – I understand it is Istio specific guideline but maybe > it is better to at least be aligned with one (popular) vendor guideline > instead of none at all. > # Add the “appProtocol” property[1] that is not specific to any vendor but > requires Kubernetes >= 1.19 where it was introduced as beta and moved to > stable in >= 1.20. The option to add appProtocol property was added only in > [https://github.com/fabric8io/kubernetes-client/releases/tag/v5.10.0] with > [#3570|https://github.com/fabric8io/kubernetes-client/issues/3570]. > # Or allow a way to override the defaults. > > [https://kubernetes.io/docs/concepts/services-networking/_print/#application-protocol] > > > {{# k get service inference-results-to-analytics-engine -o yaml}} > {{apiVersion: v1}} > {{kind: Service}} > {{...}} > {{spec:}} > {{ clusterIP: None}} > {{ ports:}} > {{ - name: jobmanager-rpc *# should start with “tcp-“ or add "appProtocol" > property*}} > {{ port: 6123}} > {{ protocol: TCP}} > {{ targetPort: 6123}} > {{ - name: blobserver *# should start with "tcp-" or add "appProtocol" > property*}} > {{ port: 6124}} > {{ protocol: TCP}} > {{ targetPort: 6124}} > {{...}} -- This message was sent by Atlassian Jira (v8.20.7#820007)