Great to hear that you've resolved the problem and thanks for sharing the
solution. This will help others who might run into a similar problem.
Cheers,
Till
On Wed, Aug 22, 2018, 16:14 Bruno Aranda wrote:
> Actually, I have found the issue. It was a simple thing, really, once you
> know it of c
Actually, I have found the issue. It was a simple thing, really, once you
know it of course.
It was caused by the livenessProbe kicking in too early. For a Flink
cluster with several jobs, the default 30 seconds I was using (after using
the Flink helm chart in the examples) was not enough to let t
Hi Bruno,
in order to debug this problem we would need a bit more information. In
particular, the logs of the cluster entrypoint and your K8s deployment
specification would be helpful. If you have some memory limits specified
these would also be interesting to know.
Cheers,
Till
On Sun, Aug 19,
Hi Bruno,
Ping Till for you, he may give you some useful information.
Thanks, vino.
Bruno Aranda 于2018年8月19日周日 上午6:57写道:
> Hi,
>
> I am experiencing an issue when a job manager is trying to recover using a
> HA setup. When the job manager starts again and tries to resume from the
> last checkp
Hi,
I am experiencing an issue when a job manager is trying to recover using a
HA setup. When the job manager starts again and tries to resume from the
last checkpoints, it gets killed by Kubernetes (I guess), since I can see
the following in the logs while the jobs are deployed:
INFO org.apache