Hi,

Which deployment mode do you use? What is the Flink version?
I think killing TaskManagers won't make the JobMananger restart. You can
provide the whole log as an attachment to investigate.

On Wed, 12 Oct 2022 at 6:01 PM, Puneet Duggal <puneetduggal1...@gmail.com>
wrote:

> Hi Xintong Song,
>
> Thanks for your immediate reply. Yes, I do restart task manager via kill
> command and then flink restart because I have seen cases where simple flink
> restart does not pickup the latest configuration. But what I am confused
> about is why killing the task manager process and then restarting it is
> causing the job manager to stop and restart.
>
> Regards,
> Puneet
>
>
> On 12-Oct-2022, at 7:33 AM, Xintong Song <tonysong...@gmail.com> wrote:
>
> The log shows that the jobmanager received a SIGTERM signal from external.
> Depending on how you deploy Flink, that could be a 'kill <PID>' command, or
> a kubernetes pod removal / eviction, etc. You may want to check where the
> signal came from.
>
> Best,
> Xintong
>
>
>
> On Wed, Oct 12, 2022 at 6:26 AM Puneet Duggal <puneetduggal1...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I am facing an issue where when restarting task manager after adding some
>> configuration changes, even though task manager restarts successfully with
>> the updated configuration change, is causing the leader job manager to
>> restart as well. Pasting the leader job manager logs here
>>
>>
>> 2022-10-11 22:11:02,207 WARN  akka.remote.ReliableDeliverySupervisor
>>                  [] - Association with remote system [
>> akka.tcp://flink@<TM-IP>:35376] has failed, address is now gated for
>> [50] ms. Reason: [Disassociated]
>> 2022-10-11 22:11:02,411 WARN  akka.remote.transport.netty.NettyTransport
>>                  [] - Remote connection to [null] failed with
>> java.net.ConnectException: Connection refused: /<TM-IP>:35376
>> 2022-10-11 22:11:02,413 WARN  akka.remote.ReliableDeliverySupervisor
>>                  [] - Association with remote system [
>> akka.tcp://flink@<TM-IP>:35376] has failed, address is now gated for
>> [50] ms. Reason: [Association failed with [akka.tcp://flink@<TM-IP>:35376]]
>> Caused by: [java.net.ConnectException: Connection refused: /<TM-IP>:35376]
>> 2022-10-11 22:11:02,682 WARN  akka.remote.transport.netty.NettyTransport
>>                  [] - Remote connection to [null] failed with
>> java.net.ConnectException: Connection refused: /<TM-IP>:35376
>> 2022-10-11 22:11:02,683 WARN  akka.remote.ReliableDeliverySupervisor
>>                  [] - Association with remote system [
>> akka.tcp://flink@<TM-IP>:35376] has failed, address is now gated for
>> [50] ms. Reason: [Association failed with [akka.tcp://flink@<TM-IP>:35376]]
>> Caused by: [java.net.ConnectException: Connection refused: /<TM-IP>:35376]
>> 2022-10-11 22:11:12,702 WARN  akka.remote.transport.netty.NettyTransport
>>                  [] - Remote connection to [null] failed with
>> java.net.ConnectException: Connection refused: /<TM-IP>:35376
>> 2022-10-11 22:11:12,703 WARN  akka.remote.ReliableDeliverySupervisor
>>                  [] - Association with remote system [
>> akka.tcp://flink@<TM-IP>:35376] has failed, address is now gated for
>> [50] ms. Reason: [Association failed with [akka.tcp://flink@<TM-IP>:35376]]
>> Caused by: [java.net.ConnectException: Connection refused: /<TM-IP>:35376]
>> 2022-10-11 22:11:21,683 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - RECEIVED
>> SIGNAL 15: SIGTERM. Shutting down as requested.
>> 2022-10-11 22:11:21,687 INFO  org.apache.flink.runtime.blob.BlobServer
>>                  [] - Stopped BLOB server at 0.0.0.0:33887
>>
>>
>> Regards,
>> Puneet
>>
>>
>>
>

Reply via email to