Re: Temporary failure in name resolution on JobManager

2019-12-02 Thread David Maddison
tarting the JobManager JVM does successfully recover the Job, but I'd >> like to avoid having to do that if possible. >> >> Caused by: java.net.UnknownHostException: <>.com: Temporary >> failure in name resolution >> at java.net.Inet4AddressImpl.lookupAllH

Re: Temporary failure in name resolution on JobManager

2019-12-01 Thread Yang Wang
t install a SecurityManager and therefore the > JVM should only cache invalid name requests for 10 seconds. > > Restarting the JobManager JVM does successfully recover the Job, but I'd > like to avoid having to do that if possible. > > Caused by

Temporary failure in name resolution on JobManager

2019-11-29 Thread David Maddison
cessfully recover the Job, but I'd like to avoid having to do that if possible. Caused by: java.net.UnknownHostException: <****>.com: Temporary failure in name resolution at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$2.l

Re: Temporary failure in name resolution

2018-04-04 Thread Fabian Hueske
Hi, The issue might be related to garbage collection pauses during which the TM JVM cannot communicate with the JM. The metrics contain a stats for memory consumpion [1] and GC activity [2] that can help to diagnose the problem. Best, Fabian [1] https://ci.apache.org/projects/flink/flink-docs-re

Re: Temporary failure in name resolution

2018-04-03 Thread miki haiat
HI , i checked the code again the figure out where the problem can be i just wondered if im implementing the Evictor correctly ? full code https://gist.github.com/miko-code/6d7010505c3cb95be122364b29057237 public static class EsbTraceEvictor implements Evictor { org.slf4j.Logger LOG =

Re: Temporary failure in name resolution

2018-04-03 Thread Hao Sun
Hi Timo, we do have similar issue, TM got killed by a job. Is there a way to monitor JVM status? If through the monitor metrics, what metric I should look after? We are running Flink on K8S. Is there a possibility that a job consumes too much network bandwidth, so JM and TM can not connect? On Tue

Re: Temporary failure in name resolution

2018-04-03 Thread Timo Walther
Hi Miki, for me this sounds like your job has a resource leak such that your memory fills up and the JVM of the TaskManager is killed at some point. How does your job look like? I see a WindowedStream.apply which might not be appropriate if you have big/frequent windows where the evaluation h

Re: akka.remote.ReliableDeliverySupervisor Temporary failure in name resolution

2018-03-06 Thread Nico Kruber
rting them again worked without a flaw. My bet is on something Flink-external because of the "Temporary failure in name resolution" error message. Maybe @Patrick (cc'd) has encountered this before and knows more. Nico [1] https://ci.apache.org/projects/flink/flink-docs-r

akka.remote.ReliableDeliverySupervisor Temporary failure in name resolution

2018-03-06 Thread miki haiat
visor > - Association with remote system > [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073] has failed, > address is now gated for [5000] ms. Reason: [Association failed with > [akka.tcp://flink@flink-taskmanager-3509325052-bqtkd:35073]] Caused by: > [flink