Hi. My Flink Deployment is set to use savepoint for upgrades and for taking savepoint before stopping. When rescaling happens, for some reason it scales the JobManager to zero (“Scaling JobManager Deployment to zero with 300 seconds timeout”) and the job goes into FINISHED state. It doesn’t seem to be able to continue. Any ideas why is it deleting itself? The savepoints are stored on s3. I can restart the job from savepoint manually but at the next rescaling operation it deletes itself again.
Thanks! Log: 2024-04-22 23:39:17,016 o.a.f.k.o.o.d.ApplicationObserver [INFO ][flink/f-d7681d0f-c093-5d8a-b5f5-2b66b4547bf6] Observing JobManager deployment. Previous status: DEPLOYING 2024-04-22 23:39:17,024 o.a.f.k.o.o.d.ApplicationObserver [INFO ][flink/f-d7681d0f-c093-5d8a-b5f5-2b66b4547bf6] JobManager is being deployed 2024-04-22 23:39:17,045 o.a.f.a.JobAutoScalerImpl [INFO ][flink/f-d7681d0f-c093-5d8a-b5f5-2b66b4547bf6] Cleaning up autoscaling meta data 2024-04-22 23:39:17,046 o.a.f.k.o.s.AbstractFlinkService [INFO ][flink/f-d7681d0f-c093-5d8a-b5f5-2b66b4547bf6] Deleting cluster with Foreground propagation 2024-04-22 23:39:17,046 o.a.f.k.o.s.AbstractFlinkService [INFO ][flink/f-d7681d0f-c093-5d8a-b5f5-2b66b4547bf6] Scaling JobManager Deployment to zero with 300 seconds timeout... 2024-04-22 23:39:18,941 o.a.f.r.u.c.m.ProcessMemoryUtils [INFO ][flink/f-cdda7a8d-8259-5137-8397-8125b212e556] The derived from fraction jvm overhead memory (102.400mb (107374184 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead 2024-04-22 23:39:18,982 o.a.f.a.ScalingMetricCollector [WARN ][flink/f-cdda7a8d-8259-5137-8397-8125b212e556] pendingRecords metric for e22861b1943d40ca6f5a40ae6332d42b could not be found. Either a legacy source or an idle source. Assuming no pending records. 2024-04-22 23:39:18,984 o.a.f.a.ScalingMetricCollector [WARN ][flink/f-cdda7a8d-8259-5137-8397-8125b212e556] pendingRecords metric for e503bd1c0fb6799d36f0c4b786e69fd9 could not be found. Either a legacy source or an idle source. Assuming no pending records. 2024-04-22 23:39:18,985 o.a.f.a.ScalingMetricCollector [WARN ][flink/f-cdda7a8d-8259-5137-8397-8125b212e556] pendingRecords metric for de478b666343f04920f1cd3e6e65548c could not be found. Either a legacy source or an idle source. Assuming no pending records. 2024-04-22 23:39:18,987 o.a.f.a.ScalingMetricCollector [WARN ][flink/f-cdda7a8d-8259-5137-8397-8125b212e556] pendingRecords metric for 0956ec3e5721a3bc416c208d690e220a could not be found. Either a legacy source or an idle source. Assuming no pending records. 2024-04-22 23:39:18,988 o.a.f.a.ScalingMetricCollector [WARN ][flink/f-cdda7a8d-8259-5137-8397-8125b212e556] pendingRecords metric for 2a0e1e1304dc61d997d3e6f7025df9b3 could not be found. Either a legacy source or an idle source. Assuming no pending records. 2024-04-22 23:39:18,989 o.a.f.a.ScalingMetricCollector [WARN ][flink/f-cdda7a8d-8259-5137-8397-8125b212e556] pendingRecords metric for 09152f8c760b2503dd4174abf81781b6 could not be found. Either a legacy source or an idle source. Assuming no pending records. 2024-04-22 23:39:18,991 o.a.f.a.ScalingMetricCollector [WARN ][flink/f-cdda7a8d-8259-5137-8397-8125b212e556] pendingRecords metric for cbe38acc7ac41cc91794215391eedc28 could not be found. Either a legacy source or an idle source. Assuming no pending records. 2024-04-22 23:39:18,992 o.a.f.a.ScalingMetricCollector [WARN ][flink/f-cdda7a8d-8259-5137-8397-8125b212e556] pendingRecords metric for cc2707c5c5b5cdd319d28980a6ad99d0 could not be found. Either a legacy source or an idle source. Assuming no pending records. 2024-04-22 23:39:18,994 o.a.f.a.ScalingMetricCollector [WARN ][flink/f-cdda7a8d-8259-5137-8397-8125b212e556] pendingRecords metric for d2fd710697779b81072031f8924b5967 could not be found. Either a legacy source or an idle source. Assuming no pending records. 2024-04-22 23:39:18,995 o.a.f.a.ScalingMetricCollector [WARN ][flink/f-cdda7a8d-8259-5137-8397-8125b212e556] pendingRecords metric for 9892bf1d00288ef951498dd18f18dd24 could not be found. Either a legacy source or an idle source. Assuming no pending records. 2024-04-22 23:39:18,997 o.a.f.a.ScalingMetricCollector [WARN ][flink/f-cdda7a8d-8259-5137-8397-8125b212e556] pendingRecords metric for acf3064b984e12010bfeccbe4e28d9a5 could not be found. Either a legacy source or an idle source. Assuming no pending records. 2024-04-22 23:39:18,998 o.a.f.a.ScalingMetricCollector [WARN ][flink/f-cdda7a8d-8259-5137-8397-8125b212e556] pendingRecords metric for 8737aeacfef5f24708290203c9de47e1 could not be found. Either a legacy source or an idle source. Assuming no pending records. 2024-04-22 23:39:18,999 o.a.f.a.ScalingMetricCollector [WARN ][flink/f-cdda7a8d-8259-5137-8397-8125b212e556] pendingRecords metric for 45aa9886e32677c3f13caa2aa54ec3ad could not be found. Either a legacy source or an idle source. Assuming no pending records. 2024-04-22 23:39:19,001 o.a.f.a.ScalingMetricCollector [WARN ][flink/f-cdda7a8d-8259-5137-8397-8125b212e556] pendingRecords metric for 1d4f6379179c05db8001430cd4b772f7 could not be found. Either a legacy source or an idle source. Assuming no pending records. 2024-04-22 23:39:19,003 o.a.f.a.ScalingMetricCollector [WARN ][flink/f-cdda7a8d-8259-5137-8397-8125b212e556] pendingRecords metric for 2de822e03c5f4f6b0ca9ea8210622acb could not be found. Either a legacy source or an idle source. Assuming no pending records. 2024-04-22 23:39:19,151 o.a.f.a.ScalingExecutor [INFO ][flink/f-cdda7a8d-8259-5137-8397-8125b212e556] All job vertices are currently running at their target parallelism. 2024-04-22 23:39:19,218 o.a.f.k.o.r.d.AbstractFlinkResourceReconciler [INFO ][flink/f-cdda7a8d-8259-5137-8397-8125b212e556] Resource fully reconciled, nothing to do... 2024-04-22 23:39:20,535 o.a.f.k.o.s.AbstractFlinkService [INFO ][flink/f-d7681d0f-c093-5d8a-b5f5-2b66b4547bf6] Completed Scaling JobManager Deployment to zero 2024-04-22 23:39:20,536 o.a.f.k.o.s.AbstractFlinkService [INFO ][flink/f-d7681d0f-c093-5d8a-b5f5-2b66b4547bf6] Deleting JobManager Deployment with 296 seconds timeout... 2024-04-22 23:39:20,631 o.a.f.k.o.s.AbstractFlinkService [INFO ][flink/f-d7681d0f-c093-5d8a-b5f5-2b66b4547bf6] Completed Deleting JobManager Deployment ________________________________ COGILITY SOFTWARE CORPORATION LEGAL DISCLAIMER: The information in this email is confidential and is intended solely for the addressee. Access to this email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful.