Could you please have a try with high availability enabled[1]? If HA enabled, the internal jobmanager rpc service will not be created. Instead, the TaskManager retrieves the JobManager address via HA services and connects to it via pod ip.
[1]. https://github.com/apache/flink-kubernetes-operator/blob/main/examples/basic-checkpoint-ha.yaml Best, Yang Elisha, Moshe (Nokia - IL/Kfar Sava) <moshe.eli...@nokia.com> 于2022年6月16日周四 15:24写道: > Hello, > > > > We are launching Flink deployments using the Flink Kubernetes Operator > <https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-stable/> > on a Kubernetes cluster with Istio and mTLS enabled. > > > > We found that the TaskManager is unable to communicate with the JobManager > on the jobmanager-rpc port: > > > > 2022-06-15 15:25:40,508 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink@amf-events-to-inference-and-central.nwdaf-edge:6123] > has failed, address is now gated for [50] ms. Reason: [Association failed > with [akka.tcp://flink@amf-events-to-inference-and-central.nwdaf-edge:6123]] > Caused by: [The remote system explicitly disassociated (reason unknown).] > > > > The reason for the issue is that the JobManager service port definitions are > not following the Istio guidelines > https://istio.io/latest/docs/ops/configuration/traffic-management/protocol-selection/ > (see example below). > > > > We believe a change to the default port definitions is needed but for now, > is there an immediate action we can take to work around the issue? Perhaps > overriding the default port definitions somehow? > > > > Thanks. > > > > > > flink-kubernetes-operator 1.0.0 > > Flink 1.14-java11 > > Kubernetes v1.19.5 > > Istio 1.7.6 > > > > > > # k get service inference-results-to-analytics-engine -o yaml > > apiVersion: v1 > > kind: Service > > metadata: > > ... > > labels: > > app: inference-results-to-analytics-engine > > type: flink-native-kubernetes > > name: inference-results-to-analytics-engine > > spec: > > clusterIP: None > > ports: > > - name: jobmanager-rpc # should start with “tcp-“ or add "appProtocol" > property > > port: 6123 > > protocol: TCP > > targetPort: 6123 > > - name: blobserver # should start with "tcp-" or add "appProtocol" > property > > port: 6124 > > protocol: TCP > > targetPort: 6124 > > selector: > > app: inference-results-to-analytics-engine > > component: jobmanager > > type: flink-native-kubernetes > > sessionAffinity: None > > type: ClusterIP > > status: > > loadBalancer: {} > > >