Hi , Im running flink jobs on kubernetes after a day or so. the task manager and job manager losing connection and i have to restart earthing . Im assuming that one of the pods crashed and when now pod start he cant find the job manager ? Also i saw that is an Akka issue... and it wiil be fixed in version 1.5 .
How can i safely deploy jobs on kubernetes . task manager logs > 2018-03-06 07:23:18,186 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Trying to > register at JobManager akka.tcp://flink@flink-jobmanager:6123/user/jobmanager > (attempt 1594, timeout: 30000 milliseconds) > 2018-03-06 07:23:48,196 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Trying to > register at JobManager akka.tcp://flink@flink-jobmanager:6123/user/jobmanager > (attempt 1595, timeout: 30000 milliseconds) > 2018-03-06 07:24:18,216 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Trying to > register at JobManager akka.tcp://flink@flink-jobmanager:6123/user/jobmanager > (attempt 1596, timeout: 30000 milliseconds) > 2018-03-06 07:24:48,237 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Trying to > register at JobManager akka.tcp://flink@flink-jobmanager:6123/user/jobmanager > (attempt 1597, timeout: 30000 milliseconds) > 2018-03-06 07:24:53,042 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@flink-jobmanager:6123] has failed, address is now gated > for [5000] ms. Reason: [Disassociated] Job manager logs > > 2018-03-06 07:25:18,262 INFO > org.apache.flink.runtime.instance.InstanceManager - Registered > TaskManager at flink-taskmanager-3509325052-bqtkd > (akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073/user/taskmanager) > as c37614c28df29d34b80676488e386da3. Current number of registered hosts is > 2. Current number of alive task slots is 16. > 2018-03-06 07:25:18,263 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073] has failed, > address is now gated for [5000] ms. Reason: [Association failed with > [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073]] Caused by: > [flink-taskmanager-3509325052-bqtkd: Temporary failure in name resolution] > 2018-03-06 07:25:23,282 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073] has failed, > address is now gated for [5000] ms. Reason: [Association failed with > [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073]] Caused by: > [flink-taskmanager-3509325052-bqtkd] > 2018-03-06 07:25:28,303 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073] has failed, > address is now gated for [5000] ms. Reason: [Association failed with > [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073]] Caused by: > [flink-taskmanager-3509325052-bqtkd: Temporary failure in name resolution] > 2018-03-06 07:25:33,322 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073] has failed, > address is now gated for [5000] ms. Reason: [Association failed with > [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073]] Caused by: > [flink-taskmanager-3509325052-bqtkd] > 2018-03-06 07:25:38,343 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073] has failed, > address is now gated for [5000] ms. Reason: [Association failed with > [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073]] Caused by: > [flink-taskmanager-3509325052-bqtkd: Temporary failure in name resolution] > 2018-03-06 07:25:43,362 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073] has failed, > address is now gated for [5000] ms. Reason: [Association failed with > [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073]] Caused by: > [flink-taskmanager-3509325052-bqtkd] > 2018-03-06 07:25:48,383 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073] has failed, > address is now gated for [5000] ms. Reason: [Association failed with > [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073]] Caused by: > [flink-taskmanager-3509325052-bqtkd: Temporary failure in name resolution] > 2018-03-06 07:25:53,402 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073] has failed, > address is now gated for [5000] ms. Reason: [Association failed with > [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073]] Caused by: > [flink-taskmanager-3509325052-bqtkd] > 2018-03-06 07:25:58,423 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073] has failed, > address is now gated for [5000] ms. Reason: [Association failed with > [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073]] Caused by: > [flink-taskmanager-3509325052-bqtkd: Temporary failure in name resolution] > 2018-03-06 07:26:03,442 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073] has failed, > address is now gated for [5000] ms. Reason: [Association failed with > [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073]] Caused by: > [flink-taskmanager-3509325052-bqtkd] > 2018-03-06 07:26:08,463 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073] has failed, > address is now gated for [5000] ms. Reason: [Association failed with > [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073]] Caused by: > [flink-taskmanager-3509325052-bqtkd: Temporary failure in name resolution] > 2018-03-06 07:26:13,482 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073] has failed, > address is now gated for [5000] ms. Reason: [Association failed with > [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073]] Caused by: > [flink-taskmanager-3509325052-bqtkd] > 2018-03-06 07:26:18,504 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073] has failed, > address is now gated for [5000] ms. Reason: [Association failed with > [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073]] Caused by: > [flink-taskmanager-3509325052-bqtkd: Temporary failure in name resolution]