Hi, Which deployment mode do you use? What is the Flink version? I think killing TaskManagers won't make the JobMananger restart. You can provide the whole log as an attachment to investigate.
On Wed, 12 Oct 2022 at 6:01 PM, Puneet Duggal <puneetduggal1...@gmail.com> wrote: > Hi Xintong Song, > > Thanks for your immediate reply. Yes, I do restart task manager via kill > command and then flink restart because I have seen cases where simple flink > restart does not pickup the latest configuration. But what I am confused > about is why killing the task manager process and then restarting it is > causing the job manager to stop and restart. > > Regards, > Puneet > > > On 12-Oct-2022, at 7:33 AM, Xintong Song <tonysong...@gmail.com> wrote: > > The log shows that the jobmanager received a SIGTERM signal from external. > Depending on how you deploy Flink, that could be a 'kill <PID>' command, or > a kubernetes pod removal / eviction, etc. You may want to check where the > signal came from. > > Best, > Xintong > > > > On Wed, Oct 12, 2022 at 6:26 AM Puneet Duggal <puneetduggal1...@gmail.com> > wrote: > >> Hi, >> >> I am facing an issue where when restarting task manager after adding some >> configuration changes, even though task manager restarts successfully with >> the updated configuration change, is causing the leader job manager to >> restart as well. Pasting the leader job manager logs here >> >> >> 2022-10-11 22:11:02,207 WARN akka.remote.ReliableDeliverySupervisor >> [] - Association with remote system [ >> akka.tcp://flink@<TM-IP>:35376] has failed, address is now gated for >> [50] ms. Reason: [Disassociated] >> 2022-10-11 22:11:02,411 WARN akka.remote.transport.netty.NettyTransport >> [] - Remote connection to [null] failed with >> java.net.ConnectException: Connection refused: /<TM-IP>:35376 >> 2022-10-11 22:11:02,413 WARN akka.remote.ReliableDeliverySupervisor >> [] - Association with remote system [ >> akka.tcp://flink@<TM-IP>:35376] has failed, address is now gated for >> [50] ms. Reason: [Association failed with [akka.tcp://flink@<TM-IP>:35376]] >> Caused by: [java.net.ConnectException: Connection refused: /<TM-IP>:35376] >> 2022-10-11 22:11:02,682 WARN akka.remote.transport.netty.NettyTransport >> [] - Remote connection to [null] failed with >> java.net.ConnectException: Connection refused: /<TM-IP>:35376 >> 2022-10-11 22:11:02,683 WARN akka.remote.ReliableDeliverySupervisor >> [] - Association with remote system [ >> akka.tcp://flink@<TM-IP>:35376] has failed, address is now gated for >> [50] ms. Reason: [Association failed with [akka.tcp://flink@<TM-IP>:35376]] >> Caused by: [java.net.ConnectException: Connection refused: /<TM-IP>:35376] >> 2022-10-11 22:11:12,702 WARN akka.remote.transport.netty.NettyTransport >> [] - Remote connection to [null] failed with >> java.net.ConnectException: Connection refused: /<TM-IP>:35376 >> 2022-10-11 22:11:12,703 WARN akka.remote.ReliableDeliverySupervisor >> [] - Association with remote system [ >> akka.tcp://flink@<TM-IP>:35376] has failed, address is now gated for >> [50] ms. Reason: [Association failed with [akka.tcp://flink@<TM-IP>:35376]] >> Caused by: [java.net.ConnectException: Connection refused: /<TM-IP>:35376] >> 2022-10-11 22:11:21,683 INFO >> org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - RECEIVED >> SIGNAL 15: SIGTERM. Shutting down as requested. >> 2022-10-11 22:11:21,687 INFO org.apache.flink.runtime.blob.BlobServer >> [] - Stopped BLOB server at 0.0.0.0:33887 >> >> >> Regards, >> Puneet >> >> >> >