Thanks Yang.
We did try both those properties and it didn't fix it. However, we did
EVENTUALLY (after some late nights!) track the issue down, not to DNS
resolution but rather an obscure bug our our connector code :-(
Thanks for your response,
/David/
On Mon, Dec 2, 2019 at 3:16 AM Yang Wang w
Hi David,
Do you mean when the JobManager starts, the dns has some problem and the
service could
not be resolved? The dns restores to normal, and the JobManager jvm could
not look up the
dns.
I think it may because the jvm dns cache. You could set the ttl and have a
try.
sun.net.inetaddr.ttl
sun.n
Hi,
The issue might be related to garbage collection pauses during which the TM
JVM cannot communicate with the JM.
The metrics contain a stats for memory consumpion [1] and GC activity [2]
that can help to diagnose the problem.
Best, Fabian
[1]
https://ci.apache.org/projects/flink/flink-docs-re
HI ,
i checked the code again the figure out where the problem can be
i just wondered if im implementing the Evictor correctly ?
full code
https://gist.github.com/miko-code/6d7010505c3cb95be122364b29057237
public static class EsbTraceEvictor implements Evictor {
org.slf4j.Logger LOG =
Hi Timo, we do have similar issue, TM got killed by a job. Is there a way
to monitor JVM status? If through the monitor metrics, what metric I should
look after?
We are running Flink on K8S. Is there a possibility that a job consumes too
much network bandwidth, so JM and TM can not connect?
On Tue
Hi Miki,
for me this sounds like your job has a resource leak such that your
memory fills up and the JVM of the TaskManager is killed at some point.
How does your job look like? I see a WindowedStream.apply which might
not be appropriate if you have big/frequent windows where the evaluation
h