Hello,
We are launching Flink deployments using the Flink Kubernetes
Operator<https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-stable/>
on a Kubernetes cluster with Istio and mTLS enabled.
We found that the TaskManager is unable to communicate with the JobManager on
the jobmanager-rpc port:
2022-06-15 15:25:40,508 WARN akka.remote.ReliableDeliverySupervisor
[] - Association with remote system
[akka.tcp://[email protected]:6123] has
failed, address is now gated for [50] ms. Reason: [Association failed with
[akka.tcp://[email protected]:6123]] Caused
by: [The remote system explicitly disassociated (reason unknown).]
The reason for the issue is that the JobManager service port definitions are
not following the Istio guidelines
https://istio.io/latest/docs/ops/configuration/traffic-management/protocol-selection/
(see example below).
We believe a change to the default port definitions is needed but for now, is
there an immediate action we can take to work around the issue? Perhaps
overriding the default port definitions somehow?
Thanks.
flink-kubernetes-operator 1.0.0
Flink 1.14-java11
Kubernetes v1.19.5
Istio 1.7.6
# k get service inference-results-to-analytics-engine -o yaml
apiVersion: v1
kind: Service
metadata:
...
labels:
app: inference-results-to-analytics-engine
type: flink-native-kubernetes
name: inference-results-to-analytics-engine
spec:
clusterIP: None
ports:
- name: jobmanager-rpc # should start with “tcp-“ or add "appProtocol"
property
port: 6123
protocol: TCP
targetPort: 6123
- name: blobserver # should start with "tcp-" or add "appProtocol" property
port: 6124
protocol: TCP
targetPort: 6124
selector:
app: inference-results-to-analytics-engine
component: jobmanager
type: flink-native-kubernetes
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}