Robert Metzger created FLINK-37730: -------------------------------------- Summary: Collect job exceptions as kubernetes events Key: FLINK-37730 URL: https://issues.apache.org/jira/browse/FLINK-37730 Project: Flink Issue Type: Improvement Components: Kubernetes Operator Reporter: Robert Metzger
In my understanding, the Flink Kubernetes Operator is currently not tracking the exception history for a job, listed in the JobManager UI. Exposing the exception history in the CR is not feasible due to size concerns. Exposing the exception history as kubernetes events seems to be a reasonable middle ground. Exceptions have a default expiration of 1 hour on the Kubernetes API server. We could introduce a config parameter for the number of exceptions from the history to replicate into k8s events. Assume a Flink Job has 5 exceptions, the user has configured the history size to be 4. FKO will regularly check, if there are exception events (based on the exception timestamp) for the last 4 exceptions. If not, those events will be created. -- This message was sent by Atlassian Jira (v8.20.10#820010)