Re: [PR] [FLINK-37730][Job Manager] Expose JM exception as K8s exceptions [flink-kubernetes-operator]

via GitHub Fri, 09 May 2025 12:21:27 -0700


vsantwana commented on code in PR #978:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/978#discussion_r2080915688



##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/observer/deployment/ApplicationObserver.java:
##########
@@ -17,22 +17,50 @@
 
 package org.apache.flink.kubernetes.operator.observer.deployment;
 
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.JobID;
+import org.apache.flink.autoscaler.utils.DateTimeUtils;
 import org.apache.flink.kubernetes.operator.api.FlinkDeployment;
 import org.apache.flink.kubernetes.operator.api.status.FlinkDeploymentStatus;
 import org.apache.flink.kubernetes.operator.controller.FlinkResourceContext;
 import org.apache.flink.kubernetes.operator.observer.ClusterHealthObserver;
 import org.apache.flink.kubernetes.operator.observer.JobStatusObserver;
 import org.apache.flink.kubernetes.operator.observer.SnapshotObserver;
 import org.apache.flink.kubernetes.operator.utils.EventRecorder;
+import org.apache.flink.runtime.rest.messages.JobExceptionsInfoWithHistory;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.time.Instant;
+import java.time.ZoneId;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.concurrent.ConcurrentHashMap;
 
 import static 
org.apache.flink.kubernetes.operator.config.KubernetesOperatorConfigOptions.OPERATOR_CLUSTER_HEALTH_CHECK_ENABLED;
 
 /** The observer of {@link 
org.apache.flink.kubernetes.operator.config.Mode#APPLICATION} cluster. */
 public class ApplicationObserver extends AbstractFlinkDeploymentObserver {
 
+    /** The cache entry for the last recorded exception timestamp. */
+    public static final class ExceptionCacheEntry {
+        final String jobId;
+        final long lastTimestamp;
+
+        ExceptionCacheEntry(String jobId, long lastTimestamp) {
+            this.jobId = jobId;
+            this.lastTimestamp = lastTimestamp;
+        }
+    }
+
+    @VisibleForTesting
+    final Map<String, ExceptionCacheEntry> lastRecordedExceptionCache = new 
ConcurrentHashMap<>();

Review Comment:
   The JobID is in ExceptionCacheEntry to make sure that we are tracking the 
exception of the correct JobId, if for the same ResourceID, the jobId has 
changed, then it is treated like we are seeing the exception for the first time 
and only the number of exception limit is applied



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [FLINK-37730][Job Manager] Expose JM exception as K8s exceptions [flink-kubernetes-operator]

Reply via email to