Could you share the JobManager logs so that we could check whether it
received the
registration from TasManager?

In a non-HA Flink cluster, the TaskManager is using the service to talk to
JobManager.
Currently, Flink creates a headless service for JobManager. You could use
`kubectl get svc`
to find it. And then start a busybox to check the network connectivity.

And maybe you could share more information about the environment. I could
not reproduce
your issue in a typical K8s cluster.

Best,
Yang

Yun Gao <yungao...@aliyun.com> 于2020年10月30日周五 上午11:53写道:

> Hi Liangde,
>
>    I pull in Yang Wang who is the expert for Flink on K8s.
>
> Best,
>  Yun
>
> ------------------Original Mail ------------------
> *Sender:*Chen Liangde <lian...@gmail.com>
> *Send Date:*Fri Oct 30 05:30:40 2020
> *Recipients:*Flink ML <user@flink.apache.org>
> *Subject:*Native kubernetes setup failed to start job
>
>> I created a flink cluster in kubernetes following this guide:
>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html
>>
>> The job manager was running. When a job was submitted to the job manager,
>> it spawned a task manager pod, but the task manager failed to connect to
>> the job manager. And in the job manager web ui I can't find the task
>> manager.
>>
>> This error is
>> suspicious: 
>> org.apache.flink.shaded.akka.org.jboss.netty.handler.codec.frame.TooLongFrameException:
>> Adjusted frame length exceeds 10485760: 352518404 - discarded
>>
>> 2020-10-29 13:22:51,069 INFO  
>> org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Connecting 
>> to ResourceManager 
>> akka.tcp://fl...@detection-engine-dev.team-anti-cheat:6123/user/rpc/resourcemanager_*(00000000000000000000000000000000).2020-10-29
>>  13:22:51,176 WARN  akka.remote.transport.netty.NettyTransport               
>>     [] - Remote connection to 
>> [detection-engine-dev.team-anti-cheat/10.123.155.112:6123] failed with 
>> java.io.IOException: Connection reset by peer2020-10-29 13:22:51,176 WARN  
>> akka.remote.transport.netty.NettyTransport                   [] - Remote 
>> connection to [detection-engine-dev.team-anti-cheat/10.123.155.112:6123] 
>> failed with 
>> org.apache.flink.shaded.akka.org.jboss.netty.handler.codec.frame.TooLongFrameException:
>>  Adjusted frame length exceeds 10485760: 352518404 - discarded2020-10-29 
>> 13:22:51,180 WARN  akka.remote.ReliableDeliverySupervisor                    
>>    [] - Association with remote system 
>> [akka.tcp://fl...@detection-engine-dev.team-anti-cheat:6123] has failed, 
>> address is now gated for [50] ms. Reason: [Association failed with 
>> [akka.tcp://fl...@detection-engine-dev.team-anti-cheat:6123]] Caused by: 
>> [The remote system explicitly disassociated (reason unknown).]2020-10-29 
>> 13:22:51,183 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor        
>>    [] - Could not resolve ResourceManager address 
>> akka.tcp://fl...@detection-engine-dev.team-anti-cheat:6123/user/rpc/resourcemanager_*,
>>  retrying in 10000 ms: Could not connect to rpc endpoint under address 
>> akka.tcp://fl...@detection-engine-dev.team-anti-cheat:6123/user/rpc/resourcemanager_*.2020-10-29
>>  13:23:01,203 WARN  akka.remote.transport.netty.NettyTransport               
>>     [] - Remote connection to 
>> [detection-engine-dev.team-anti-cheat/10.123.155.112:6123] failed with 
>> java.io.IOException: Connection reset by peer
>>
>>

Reply via email to