Re: Job Manager killed by Kubernetes during recovery

vino yang Sun, 19 Aug 2018 05:43:19 -0700

Hi Bruno,

Ping Till for you, he may give you some useful information.


Thanks, vino.

Bruno Aranda <bara...@apache.org> 于2018年8月19日周日 上午6:57写道：

> Hi,
>
> I am experiencing an issue when a job manager is trying to recover using a
> HA setup. When the job manager starts again and tries to resume from the
> last checkpoints, it gets killed by Kubernetes (I guess), since I can see
> the following in the logs while the jobs are deployed:
>
> INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -
> RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested.
>
> I am requesting enough memory for it, 3000Gi, and it is configured to use
> 2048Gb of memory. I have tried to increase the max perm size, but did not
> see an improvement.
>
> Any suggestions to help diagnose this?
>
> I have the following:
>
> Flink 1.6.0 (same with 1.5.1)
> Azure AKS with Kubernetes 1.11
> State management using RocksDB with checkpoints stored in Azure Data Lake
>
> Thanks!
>
> Bruno
>
>

Re: Job Manager killed by Kubernetes during recovery

Reply via email to