Hi ,

Im running flink jobs on kubernetes after a day or so.
the task manager and job manager    losing connection   and i have to
restart earthing .
Im assuming that one of the pods crashed and when now pod start he cant
find the job manager ?
Also i saw that is an Akka issue...  and it wiil be fixed in version 1.5 .

How can i safely deploy jobs on kubernetes .


task manager logs

> 2018-03-06 07:23:18,186 INFO
> org.apache.flink.runtime.taskmanager.TaskManager              - Trying to
> register at JobManager akka.tcp://flink@flink-jobmanager:6123/user/jobmanager
> (attempt 1594, timeout: 30000 milliseconds)
> 2018-03-06 07:23:48,196 INFO
> org.apache.flink.runtime.taskmanager.TaskManager              - Trying to
> register at JobManager akka.tcp://flink@flink-jobmanager:6123/user/jobmanager
> (attempt 1595, timeout: 30000 milliseconds)
> 2018-03-06 07:24:18,216 INFO
> org.apache.flink.runtime.taskmanager.TaskManager              - Trying to
> register at JobManager akka.tcp://flink@flink-jobmanager:6123/user/jobmanager
> (attempt 1596, timeout: 30000 milliseconds)
> 2018-03-06 07:24:48,237 INFO
> org.apache.flink.runtime.taskmanager.TaskManager              - Trying to
> register at JobManager akka.tcp://flink@flink-jobmanager:6123/user/jobmanager
> (attempt 1597, timeout: 30000 milliseconds)
> 2018-03-06 07:24:53,042 WARN  akka.remote.ReliableDeliverySupervisor
>                   - Association with remote system
> [akka.tcp://flink@flink-jobmanager:6123] has failed, address is now gated
> for [5000] ms. Reason: [Disassociated]


Job manager logs

>
> 2018-03-06 07:25:18,262 INFO
> org.apache.flink.runtime.instance.InstanceManager             - Registered
> TaskManager at flink-taskmanager-3509325052-bqtkd
> (akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073/user/taskmanager)
> as c37614c28df29d34b80676488e386da3. Current number of registered hosts is
> 2. Current number of alive task slots is 16.
> 2018-03-06 07:25:18,263 WARN  akka.remote.ReliableDeliverySupervisor
>                   - Association with remote system
> [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073] has failed,
> address is now gated for [5000] ms. Reason: [Association failed with
> [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073]] Caused by:
> [flink-taskmanager-3509325052-bqtkd: Temporary failure in name resolution]
> 2018-03-06 07:25:23,282 WARN  akka.remote.ReliableDeliverySupervisor
>                   - Association with remote system
> [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073] has failed,
> address is now gated for [5000] ms. Reason: [Association failed with
> [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073]] Caused by:
> [flink-taskmanager-3509325052-bqtkd]
> 2018-03-06 07:25:28,303 WARN  akka.remote.ReliableDeliverySupervisor
>                   - Association with remote system
> [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073] has failed,
> address is now gated for [5000] ms. Reason: [Association failed with
> [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073]] Caused by:
> [flink-taskmanager-3509325052-bqtkd: Temporary failure in name resolution]
> 2018-03-06 07:25:33,322 WARN  akka.remote.ReliableDeliverySupervisor
>                   - Association with remote system
> [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073] has failed,
> address is now gated for [5000] ms. Reason: [Association failed with
> [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073]] Caused by:
> [flink-taskmanager-3509325052-bqtkd]
> 2018-03-06 07:25:38,343 WARN  akka.remote.ReliableDeliverySupervisor
>                   - Association with remote system
> [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073] has failed,
> address is now gated for [5000] ms. Reason: [Association failed with
> [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073]] Caused by:
> [flink-taskmanager-3509325052-bqtkd: Temporary failure in name resolution]
> 2018-03-06 07:25:43,362 WARN  akka.remote.ReliableDeliverySupervisor
>                   - Association with remote system
> [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073] has failed,
> address is now gated for [5000] ms. Reason: [Association failed with
> [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073]] Caused by:
> [flink-taskmanager-3509325052-bqtkd]
> 2018-03-06 07:25:48,383 WARN  akka.remote.ReliableDeliverySupervisor
>                   - Association with remote system
> [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073] has failed,
> address is now gated for [5000] ms. Reason: [Association failed with
> [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073]] Caused by:
> [flink-taskmanager-3509325052-bqtkd: Temporary failure in name resolution]
> 2018-03-06 07:25:53,402 WARN  akka.remote.ReliableDeliverySupervisor
>                   - Association with remote system
> [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073] has failed,
> address is now gated for [5000] ms. Reason: [Association failed with
> [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073]] Caused by:
> [flink-taskmanager-3509325052-bqtkd]
> 2018-03-06 07:25:58,423 WARN  akka.remote.ReliableDeliverySupervisor
>                   - Association with remote system
> [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073] has failed,
> address is now gated for [5000] ms. Reason: [Association failed with
> [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073]] Caused by:
> [flink-taskmanager-3509325052-bqtkd: Temporary failure in name resolution]
> 2018-03-06 07:26:03,442 WARN  akka.remote.ReliableDeliverySupervisor
>                   - Association with remote system
> [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073] has failed,
> address is now gated for [5000] ms. Reason: [Association failed with
> [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073]] Caused by:
> [flink-taskmanager-3509325052-bqtkd]
> 2018-03-06 07:26:08,463 WARN  akka.remote.ReliableDeliverySupervisor
>                   - Association with remote system
> [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073] has failed,
> address is now gated for [5000] ms. Reason: [Association failed with
> [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073]] Caused by:
> [flink-taskmanager-3509325052-bqtkd: Temporary failure in name resolution]
> 2018-03-06 07:26:13,482 WARN  akka.remote.ReliableDeliverySupervisor
>                   - Association with remote system
> [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073] has failed,
> address is now gated for [5000] ms. Reason: [Association failed with
> [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073]] Caused by:
> [flink-taskmanager-3509325052-bqtkd]
> 2018-03-06 07:26:18,504 WARN  akka.remote.ReliableDeliverySupervisor
>                   - Association with remote system
> [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073] has failed,
> address is now gated for [5000] ms. Reason: [Association failed with
> [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073]] Caused by:
> [flink-taskmanager-3509325052-bqtkd: Temporary failure in name resolution]

Reply via email to