Hello,
I launched a job with a larger load on hadoop yarn cluster.
The Job finished after running 5 hours, I didn't find any error from
JobManger log besides this connect exception.
*2021-02-20 13:20:14,110 WARN akka.remote.transport.netty.NettyTransport
- Remote connection to [/10.1.57.146:48368
<http://10.1.57.146:48368>] failed with java.io.IOException: Connection
reset by peer2021-02-20 13:20:14,110 WARN
akka.remote.ReliableDeliverySupervisor -
Association with remote system [akka.tcp://flink-metrics@host:35241] has
failed, address is now gated for [50] ms. Reason: [Disassociated]
2021-02-20 13:20:14,110 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system
[akka.tcp://flink@host:39493] has failed, address is now gated for [50] ms.
Reason: [Disassociated] 2021-02-20 13:20:14,110 WARN
akka.remote.ReliableDeliverySupervisor -
Association with remote system [akka.tcp://flink-metrics@host:38481] has
failed, address is now gated for [50] ms. Reason: [Disassociated] *
Any idea what caused the job to be finished and how to resolve it?
Any suggestions are appreciated.
Thanks
Best regards
Rainie