Hi Zhu, Look like it's expected. Those are the cases that are happened to our cluster.
Thanks for your response, Zhu Cam On Sun, Aug 11, 2019 at 10:53 PM Zhu Zhu <reed...@gmail.com> wrote: > Another possibility is the JM is killed externally, e.g. K8s may kill > JM/TM if it exceeds the resource limit. > > Thanks, > Zhu Zhu > > Zhu Zhu <reed...@gmail.com> 于2019年8月12日周一 下午1:45写道: > >> Hi Cam, >> >> Flink master should not die when getting disconnected with task managers. >> It may exit for cases below: >> 1. when the job terminated(FINISHED/FAILED/CANCELED). If you job is >> configured with no restart retry, a TM failure can cause the job to be >> FAILED. >> 2. JM lost HA leadership, e.g. lost connection to ZK >> 3. encounters other unexpected fatal errors. In this case we need to >> check the log to see what happens then >> >> Thanks, >> Zhu Zhu >> >> Cam Mach <cammac...@gmail.com> 于2019年8月12日周一 下午12:15写道: >> >>> Hello Flink experts, >>> >>> We are running Flink under Kubernetes and see that Job Manager >>> die/restarted whenever Task Manager die/restarted or couldn't get connected >>> each other. Is there any specific configurations/parameters that we need to >>> turn on to stop this? Or this is expected? >>> >>> Thanks, >>> Cam >>> >>>