Hello, We are launching Flink deployments using the Flink Kubernetes Operator<https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-stable/> on a Kubernetes cluster with Istio and mTLS enabled.
We found that the TaskManager is unable to communicate with the JobManager on the jobmanager-rpc port: 2022-06-15 15:25:40,508 WARN akka.remote.ReliableDeliverySupervisor [] - Association with remote system [akka.tcp://flink@amf-events-to-inference-and-central.nwdaf-edge:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@amf-events-to-inference-and-central.nwdaf-edge:6123]] Caused by: [The remote system explicitly disassociated (reason unknown).] The reason for the issue is that the JobManager service port definitions are not following the Istio guidelines https://istio.io/latest/docs/ops/configuration/traffic-management/protocol-selection/ (see example below). We believe a change to the default port definitions is needed but for now, is there an immediate action we can take to work around the issue? Perhaps overriding the default port definitions somehow? Thanks. flink-kubernetes-operator 1.0.0 Flink 1.14-java11 Kubernetes v1.19.5 Istio 1.7.6 # k get service inference-results-to-analytics-engine -o yaml apiVersion: v1 kind: Service metadata: ... labels: app: inference-results-to-analytics-engine type: flink-native-kubernetes name: inference-results-to-analytics-engine spec: clusterIP: None ports: - name: jobmanager-rpc # should start with “tcp-“ or add "appProtocol" property port: 6123 protocol: TCP targetPort: 6123 - name: blobserver # should start with "tcp-" or add "appProtocol" property port: 6124 protocol: TCP targetPort: 6124 selector: app: inference-results-to-analytics-engine component: jobmanager type: flink-native-kubernetes sessionAffinity: None type: ClusterIP status: loadBalancer: {}