[ https://issues.apache.org/jira/browse/FLINK-36140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gabor Somogyi resolved FLINK-36140. ----------------------------------- Fix Version/s: 2.0.0 Resolution: Fixed 9bcd8f4 on master > Log a warning when pods are terminated by kubernetes > ---------------------------------------------------- > > Key: FLINK-36140 > URL: https://issues.apache.org/jira/browse/FLINK-36140 > Project: Flink > Issue Type: Improvement > Components: Deployment / Kubernetes > Affects Versions: 1.19.1 > Reporter: Clara Xiong > Assignee: Clara Xiong > Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > > Scheduled maintenance or buggy nodes on Kubernetes can result random pod > termination and eventually a series of job restarts due to rolling restart of > the Kubernetes cluster nodes. The larger the job is the higher the chance it > is affected. The jobs should be able to auto-recover from these issues, but > can cause unwanted turbulence in large scale pipeline. > In this case, it is very difficult to identify what is causing the restarts > without knowing the issue at Kubernetes layer and the keyword to search with > because it is logged at INFO level. > We need to log this at higher level. If changing it from INFO to ERROR breaks > monitoring we should at least log as warning. -- This message was sent by Atlassian Jira (v8.20.10#820010)