The previous logs were all job deployment logs, then suddenly JM received SIGNAL 15, and all components started to exit
2022-07-18 20:31:27,813 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - theOpeartion namexxxxx (839/3000) (f913468fb654c6d2c3466ef28d296396) switched from INITIALIZING to RUNNING. 2022-07-18 20:31:27,879 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - RECEIVED S IGNAL 15: SIGTERM. Shutting down as requested. 2022-07-18 20:31:27,879 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - theOpeartion namexxxxx (2093/3000) (0a85fc4259b8e99dee2e9761131cac51) switched from DEPLOYING to INITIALIZING. 2022-07-18 20:31:27,882 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Shutting Y arnJobClusterEntrypoint down with application status UNKNOWN. Diagnostics Cluster entrypoint has been clos ed externally.. 2022-07-18 20:31:27,884 INFO org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint [] - Shutting down rest endpoint. 2022-07-18 20:31:27,886 INFO org.apache.flink.runtime.blob.BlobServer [] - Stopped BL OB server at 0.0.0.0:45857 2022-07-18 20:31:27,988 INFO org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint [] - Removing cache directory /tmp/flink-web-3c48a98c-6642-40de-a52a-1794256c4495/flink-web-ui 2022-07-18 20:31:27,991 INFO org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - St opping DefaultLeaderElectionService. 2022-07-18 20:31:27,991 INFO org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionDriver [] - Closing ZooKeeperLeaderElectionDriver{leaderLatchPath='/leader/rest_server/latch', connectionInformationPath='/leader/rest_server/connection_info'} 2022-07-18 20:31:28,001 INFO org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint [] - Shut down complete. 2022-07-18 20:31:28,001 INFO org.apache.flink.runtime.entrypoint.component.DispatcherResourceManagerComponent [] - Closing components. 2022-07-18 20:31:28,001 INFO org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - Stopping DefaultLeaderRetrievalService. 2022-07-18 20:31:28,001 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalDriver [] - Closing ZookeeperLeaderRetrievalDriver{connectionInformationPath='/leader/dispatcher/connection_info'}. 2022-07-18 20:31:28,001 INFO org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - Stopping DefaultLeaderRetrievalService. 2022-07-18 20:31:28,001 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalDriver [] - Closing ZookeeperLeaderRetrievalDriver{connectionInformationPath='/leader/resource_manager/connection_info'}. 2022-07-18 20:31:28,001 INFO org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - Stopping DefaultLeaderElectionService. 2022-07-18 20:31:28,001 INFO org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionDriver [] - Closing ZooKeeperLeaderElectionDriver{leaderLatchPath='/leader/dispatcher/latch', connectionInformationPath='/leader/dispatcher/connection_info'} 2022-07-18 20:31:28,020 INFO org.apache.flink.runtime.dispatcher.MiniDispatcher [] - Stopping dispatcher akka.tcp://flink@ip:43728/user/rpc/dispatcher_0. 2022-07-18 20:31:28,023 INFO org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl [] - Interrupted while waiting for queue java.lang.InterruptedException: null at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014) ~[?:1.8.0_322] at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048) ~[?:1.8.0_322] at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) ~[?:1.8.0_322] at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:323) [hadoop-yarn-client-2.8.5.jar:?] 2022-07-18 20:31:28,029 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Stopping the JobMaster for job 'flink_test' (7a9a02f6aa168abc927732855e3d230f). 2022-07-18 20:31:28,067 INFO org.apache.flink.runtime.dispatcher.MiniDispatcher [] - Job 7a9a02f6aa168abc927732855e3d230f reached terminal state SUSPENDED. ---- Replied Message ---- | From | Zhanghao Chen<zhanghao.c...@outlook.com> | | Date | 07/18/2022 21:19 | | To | SmileSmile<a511955...@163.com>、user<user@flink.apache.org> | | Cc | | | Subject | Re: flink on yarn job always restart | Hi, could you provide the whole JM log? Best, Zhanghao Chen From: SmileSmile <a511955...@163.com> Sent: Monday, July 18, 2022 20:46 To: user <user@flink.apache.org> Subject: flink on yarn job always restart hi all we meet a situation, parallelism 3000,the job contains multiple agg operation,the job recover from checkpoint or savepoint must be unrecoverable, the job restarts repeatedly jm error logorg.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - RECEIVED S IGNAL 15: SIGTERM. Shutting down as requested. flink version 1.14.5 Have any good ideas for troubleshooting?