geonyeong kim created FLINK-29116: ------------------------------------- Summary: Tried to associate with unreachable remote address Key: FLINK-29116 URL: https://issues.apache.org/jira/browse/FLINK-29116 Project: Flink Issue Type: Bug Affects Versions: kubernetes-operator-1.1.0, 1.15.1 Reporter: geonyeong kim Attachments: Screen Shot 2022-08-26 at 5.04.37 PM.png
Hello. I am planning to distribute and use FlinkDeployment through the flink kubernetes operator. CRD, operator, webbook, etc. are all set up, and we actually distributed FlinkDeployment to confirm normal operation. *However, strangely, connecting to resource manager fails if you make more than one task manager pod replica.* I thought it might be a problem with akka, timeout, etc. so I increased the values as below The connection continues to fail. - akka.retry-gate-closed-for: 10000 - akka.server-socket-worker-pool.pool-size-min: 6 - akka.server-socket-worker-pool.pool-size-max: 10 - akka.client-socket-worker-pool.pool-size-max: 10 - akka.client-socket-worker-pool.pool-size-min: 6 - blob.client.connect. The log of the taskmanager is as follows. {code:java} Association with remote system [akka.tcp://flink@10.238.80.92:6123] has failed, address is now gated for [10000] ms. Reason: [Disassociated] Could not resolve ResourceManager address akka.tcp://flink@10.238.80.92:6123/user/rpc/resourcemanager_1, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@10.238.80.92:6123/user/rpc/resourcemanager_1. Tried to associate with unreachable remote address [akka.tcp://flink@10.238.80.92:6123]. Address is now gated for 10000 ms, all messages to this address will be delivered to dead letters. Reason: [The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted.] {code} *If you go into the task manager pod and tcp check, the connection is open.* *Below are the flink versions I used.* *- flink image: 1.15.1* *- flink kubernetes operator: 1.1.0* *I would appreciate it if you could check the problem quickly.* *If it's a bug, please tell me how to detour in the current situation.* -- This message was sent by Atlassian Jira (v8.20.10#820010)