Robert Metzger created FLINK-37730:
--------------------------------------

             Summary: Collect job exceptions as kubernetes events
                 Key: FLINK-37730
                 URL: https://issues.apache.org/jira/browse/FLINK-37730
             Project: Flink
          Issue Type: Improvement
          Components: Kubernetes Operator
            Reporter: Robert Metzger


In my understanding, the Flink Kubernetes Operator is currently not tracking 
the exception history for a job, listed in the JobManager UI.
Exposing the exception history in the CR is not feasible due to size concerns.

Exposing the exception history as kubernetes events seems to be a reasonable 
middle ground. Exceptions have a default expiration of 1 hour on the Kubernetes 
API server.

We could introduce a config parameter for the number of exceptions from the 
history to replicate into k8s events.

Assume a Flink Job has 5 exceptions, the user has configured the history size 
to be 4. FKO will regularly check, if there are exception events (based on the 
exception timestamp) for the last 4 exceptions. If not, those events will be 
created.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to