gyfora commented on code in PR #978: URL: https://github.com/apache/flink-kubernetes-operator/pull/978#discussion_r2077934460
########## flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/reconciler/deployment/ApplicationReconciler.java: ########## @@ -299,9 +303,92 @@ public boolean reconcileOtherChanges(FlinkResourceContext<FlinkDeployment> ctx) return true; } + // check for JobManager exceptions if the REST API server is still up. + if (!ReconciliationUtils.isJobInTerminalState(deployment.getStatus())) { + observeJobManagerExceptions(ctx, deployment, observeConfig); + } + return cleanupTerminalJmAfterTtl(ctx.getFlinkService(), deployment, observeConfig); } + private void observeJobManagerExceptions( + FlinkResourceContext<FlinkDeployment> ctx, + FlinkDeployment deployment, + Configuration observeConfig) { + try { + var jobId = JobID.fromHexString(deployment.getStatus().getJobStatus().getJobId()); + var history = ctx.getFlinkService().getJobExceptions(deployment, jobId, observeConfig); + if (history == null || history.getExceptionHistory() == null) { + return; + } + var exceptionHistory = history.getExceptionHistory(); + var exceptions = exceptionHistory.getEntries(); + if (exceptions.isEmpty()) { + LOG.info(String.format("No exceptions found in job exception history for jobId '%s'.", jobId)); + return; + } + if (exceptionHistory.isTruncated()) { + LOG.warn(String.format("Job exception history is truncated for jobId '%s'. " + + "Some exceptions are not shown.", jobId)); + } + for (var exception : exceptions) { + emitJobManagerExceptionEvent(ctx, deployment, exception); Review Comment: A simpler alternative implementation that would not involve any status changes would be simply to have an in-memory cache of the last emitted exception timestamp on a per resource level. I would go down this route instead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org