I meant your jobmanager also received a SIGTERM signal, and you would need to figure out where it comes from.
To be specific, this line of log: 2022-10-11 22:11:21,683 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested.
I believe this is from the jobmanager log, as `ClusterEntrypoint` is a class used by jobmanager only. Hi,
Which deployment mode do you use? What is the Flink version? I think killing TaskManagers won't make the JobMananger restart. You can provide the whole log as an attachment to investigate. Hi Xintong Song,
Thanks for your immediate reply. Yes, I do restart task manager via kill command and then flink restart because I have seen cases where simple flink restart does not pickup the latest configuration. But what I am confused about is why killing the task manager process and then restarting it is causing the job manager to stop and restart.
Regards, Puneet
The log shows that the jobmanager received a SIGTERM signal from external. Depending on how you deploy Flink, that could be a 'kill <PID>' command, or a kubernetes pod removal / eviction, etc. You may want to check where the signal came from. Hi,
I am facing an issue where when restarting task manager after adding some configuration changes, even though task manager restarts successfully with the updated configuration change, is causing the leader job manager to restart as well. Pasting the leader job manager logs here
2022-10-11 22:11:02,207 WARN akka.remote.ReliableDeliverySupervisor [] - Association with remote system [akka.tcp://flink@<TM-IP>:35376] has failed, address is now gated for [50] ms. Reason: [Disassociated]
2022-10-11 22:11:02,411 WARN akka.remote.transport.netty.NettyTransport [] - Remote connection to [null] failed with java.net.ConnectException: Connection refused: /<TM-IP>:35376
2022-10-11 22:11:02,413 WARN akka.remote.ReliableDeliverySupervisor [] - Association with remote system [akka.tcp://flink@<TM-IP>:35376] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@<TM-IP>:35376]] Caused by: [java.net.ConnectException: Connection refused: /<TM-IP>:35376]
2022-10-11 22:11:02,682 WARN akka.remote.transport.netty.NettyTransport [] - Remote connection to [null] failed with java.net.ConnectException: Connection refused: /<TM-IP>:35376
2022-10-11 22:11:02,683 WARN akka.remote.ReliableDeliverySupervisor [] - Association with remote system [akka.tcp://flink@<TM-IP>:35376] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@<TM-IP>:35376]] Caused by: [java.net.ConnectException: Connection refused: /<TM-IP>:35376]
2022-10-11 22:11:12,702 WARN akka.remote.transport.netty.NettyTransport [] - Remote connection to [null] failed with java.net.ConnectException: Connection refused: /<TM-IP>:35376
2022-10-11 22:11:12,703 WARN akka.remote.ReliableDeliverySupervisor [] - Association with remote system [akka.tcp://flink@<TM-IP>:35376] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@<TM-IP>:35376]] Caused by: [java.net.ConnectException: Connection refused: /<TM-IP>:35376]
2022-10-11 22:11:21,683 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested.
2022-10-11 22:11:21,687 INFO org.apache.flink.runtime.blob.BlobServer [] - Stopped BLOB server at 0.0.0.0:33887
Regards,
Puneet
|