Hi Bruno, Ping Till for you, he may give you some useful information.
Thanks, vino. Bruno Aranda <bara...@apache.org> 于2018年8月19日周日 上午6:57写道: > Hi, > > I am experiencing an issue when a job manager is trying to recover using a > HA setup. When the job manager starts again and tries to resume from the > last checkpoints, it gets killed by Kubernetes (I guess), since I can see > the following in the logs while the jobs are deployed: > > INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - > RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested. > > I am requesting enough memory for it, 3000Gi, and it is configured to use > 2048Gb of memory. I have tried to increase the max perm size, but did not > see an improvement. > > Any suggestions to help diagnose this? > > I have the following: > > Flink 1.6.0 (same with 1.5.1) > Azure AKS with Kubernetes 1.11 > State management using RocksDB with checkpoints stored in Azure Data Lake > > Thanks! > > Bruno > >