Hi Liangde,

   I pull in Yang Wang who is the expert for Flink on K8s.  

Best,
 Yun

 ------------------Original Mail ------------------
Sender:Chen Liangde <lian...@gmail.com>
Send Date:Fri Oct 30 05:30:40 2020
Recipients:Flink ML <user@flink.apache.org>
Subject:Native kubernetes setup failed to start job

I created a flink cluster in kubernetes following this guide: 
https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html
The job manager was running. When a job was submitted to the job manager, it 
spawned a task manager pod, but the task manager failed to connect to the job 
manager. And in the job manager web ui I can't find the task manager.
This error is suspicious: 
org.apache.flink.shaded.akka.org.jboss.netty.handler.codec.frame.TooLongFrameException:
 Adjusted frame length exceeds 10485760: 352518404 - discarded
2020-10-29 13:22:51,069 INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Connecting to 
ResourceManager 
akka.tcp://fl...@detection-engine-dev.team-anti-cheat:6123/user/rpc/resourcemanager_*(00000000000000000000000000000000).2020-10-29
 13:22:51,176 WARN  akka.remote.transport.netty.NettyTransport                  
 [] - Remote connection to 
[detection-engine-dev.team-anti-cheat/10.123.155.112:6123] failed with 
java.io.IOException: Connection reset by peer2020-10-29 13:22:51,176 WARN  
akka.remote.transport.netty.NettyTransport                   [] - Remote 
connection to [detection-engine-dev.team-anti-cheat/10.123.155.112:6123] failed 
with 
org.apache.flink.shaded.akka.org.jboss.netty.handler.codec.frame.TooLongFrameException:
 Adjusted frame length exceeds 10485760: 352518404 - discarded2020-10-29 
13:22:51,180 WARN  akka.remote.ReliableDeliverySupervisor                       
[] - Association with remote system 
[akka.tcp://fl...@detection-engine-dev.team-anti-cheat:6123] has failed, 
address is now gated for [50] ms. Reason: [Association failed with 
[akka.tcp://fl...@detection-engine-dev.team-anti-cheat:6123]] Caused by: [The 
remote system explicitly disassociated (reason unknown).]2020-10-29 
13:22:51,183 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           
[] - Could not resolve ResourceManager address 
akka.tcp://fl...@detection-engine-dev.team-anti-cheat:6123/user/rpc/resourcemanager_*,
 retrying in 10000 ms: Could not connect to rpc endpoint under address 
akka.tcp://fl...@detection-engine-dev.team-anti-cheat:6123/user/rpc/resourcemanager_*.2020-10-29
 13:23:01,203 WARN  akka.remote.transport.netty.NettyTransport                  
 [] - Remote connection to 
[detection-engine-dev.team-anti-cheat/10.123.155.112:6123] failed with 
java.io.IOException: Connection reset by peer

Reply via email to