Could you please have a try with high availability enabled[1]?

If HA enabled, the internal jobmanager rpc service will not be created.
Instead, the TaskManager retrieves the JobManager address via HA services
and connects to it via pod ip.

[1].
https://github.com/apache/flink-kubernetes-operator/blob/main/examples/basic-checkpoint-ha.yaml


Best,
Yang

Elisha, Moshe (Nokia - IL/Kfar Sava) <moshe.eli...@nokia.com> 于2022年6月16日周四
15:24写道:

> Hello,
>
>
>
> We are launching Flink deployments using the Flink Kubernetes Operator
> <https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-stable/>
> on a Kubernetes cluster with Istio and mTLS enabled.
>
>
>
> We found that the TaskManager is unable to communicate with the JobManager
> on the jobmanager-rpc port:
>
>
>
> 2022-06-15 15:25:40,508 WARN  akka.remote.ReliableDeliverySupervisor
>                   [] - Association with remote system
> [akka.tcp://flink@amf-events-to-inference-and-central.nwdaf-edge:6123]
> has failed, address is now gated for [50] ms. Reason: [Association failed
> with [akka.tcp://flink@amf-events-to-inference-and-central.nwdaf-edge:6123]]
> Caused by: [The remote system explicitly disassociated (reason unknown).]
>
>
>
> The reason for the issue is that the JobManager service port definitions are
> not following the Istio guidelines
> https://istio.io/latest/docs/ops/configuration/traffic-management/protocol-selection/
> (see example below).
>
>
>
> We believe a change to the default port definitions is needed but for now,
> is there an immediate action we can take to work around the issue? Perhaps
> overriding the default port definitions somehow?
>
>
>
> Thanks.
>
>
>
>
>
> flink-kubernetes-operator 1.0.0
>
> Flink 1.14-java11
>
> Kubernetes v1.19.5
>
> Istio 1.7.6
>
>
>
>
>
> # k get service inference-results-to-analytics-engine -o yaml
>
> apiVersion: v1
>
> kind: Service
>
> metadata:
>
> ...
>
>   labels:
>
>     app: inference-results-to-analytics-engine
>
>     type: flink-native-kubernetes
>
>   name: inference-results-to-analytics-engine
>
> spec:
>
>   clusterIP: None
>
>   ports:
>
>   - name: jobmanager-rpc # should start with “tcp-“ or add "appProtocol"
> property
>
>     port: 6123
>
>     protocol: TCP
>
>     targetPort: 6123
>
>   - name: blobserver # should start with "tcp-" or add "appProtocol"
> property
>
>     port: 6124
>
>     protocol: TCP
>
>     targetPort: 6124
>
>   selector:
>
>     app: inference-results-to-analytics-engine
>
>     component: jobmanager
>
>     type: flink-native-kubernetes
>
>   sessionAffinity: None
>
>   type: ClusterIP
>
> status:
>
>   loadBalancer: {}
>
>
>

Reply via email to