The previous logs were all job deployment logs, then suddenly JM received 
SIGNAL 15, and all components started to exit


2022-07-18 20:31:27,813 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - theOpeartion 
namexxxxx (839/3000) (f913468fb654c6d2c3466ef28d296396) switched from 
INITIALIZING to RUNNING.
2022-07-18 20:31:27,879 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - RECEIVED S
IGNAL 15: SIGTERM. Shutting down as requested.
2022-07-18 20:31:27,879 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - theOpeartion 
namexxxxx (2093/3000) (0a85fc4259b8e99dee2e9761131cac51) switched from 
DEPLOYING to INITIALIZING.
2022-07-18 20:31:27,882 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Shutting Y
arnJobClusterEntrypoint down with application status UNKNOWN. Diagnostics 
Cluster entrypoint has been clos
ed externally..
2022-07-18 20:31:27,884 INFO  
org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint [] - Shutting
down rest endpoint.
2022-07-18 20:31:27,886 INFO  org.apache.flink.runtime.blob.BlobServer          
           [] - Stopped BL
OB server at 0.0.0.0:45857
2022-07-18 20:31:27,988 INFO  
org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint [] - Removing
cache directory /tmp/flink-web-3c48a98c-6642-40de-a52a-1794256c4495/flink-web-ui
2022-07-18 20:31:27,991 INFO  
org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - St
opping DefaultLeaderElectionService.
2022-07-18 20:31:27,991 INFO  
org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionDriver [] - 
Closing 
ZooKeeperLeaderElectionDriver{leaderLatchPath='/leader/rest_server/latch', 
connectionInformationPath='/leader/rest_server/connection_info'}
2022-07-18 20:31:28,001 INFO  
org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint [] - Shut down 
complete.
2022-07-18 20:31:28,001 INFO  
org.apache.flink.runtime.entrypoint.component.DispatcherResourceManagerComponent
 [] - Closing components.
2022-07-18 20:31:28,001 INFO  
org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - 
Stopping DefaultLeaderRetrievalService.
2022-07-18 20:31:28,001 INFO  
org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalDriver [] - 
Closing 
ZookeeperLeaderRetrievalDriver{connectionInformationPath='/leader/dispatcher/connection_info'}.
2022-07-18 20:31:28,001 INFO  
org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - 
Stopping DefaultLeaderRetrievalService.
2022-07-18 20:31:28,001 INFO  
org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalDriver [] - 
Closing 
ZookeeperLeaderRetrievalDriver{connectionInformationPath='/leader/resource_manager/connection_info'}.
2022-07-18 20:31:28,001 INFO  
org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - 
Stopping DefaultLeaderElectionService.
2022-07-18 20:31:28,001 INFO  
org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionDriver [] - 
Closing 
ZooKeeperLeaderElectionDriver{leaderLatchPath='/leader/dispatcher/latch', 
connectionInformationPath='/leader/dispatcher/connection_info'}
2022-07-18 20:31:28,020 INFO  
org.apache.flink.runtime.dispatcher.MiniDispatcher           [] - Stopping 
dispatcher akka.tcp://flink@ip:43728/user/rpc/dispatcher_0.

2022-07-18 20:31:28,023 INFO  
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl [] - 
Interrupted while waiting for queue
java.lang.InterruptedException: null
       at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
 ~[?:1.8.0_322]
       at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048)
 ~[?:1.8.0_322]
       at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) 
~[?:1.8.0_322]
       at 
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:323)
 [hadoop-yarn-client-2.8.5.jar:?]
2022-07-18 20:31:28,029 INFO  org.apache.flink.runtime.jobmaster.JobMaster      
           [] - Stopping the JobMaster for job 'flink_test' 
(7a9a02f6aa168abc927732855e3d230f).
2022-07-18 20:31:28,067 INFO  
org.apache.flink.runtime.dispatcher.MiniDispatcher           [] - Job 
7a9a02f6aa168abc927732855e3d230f reached terminal state SUSPENDED.



---- Replied Message ----
| From | Zhanghao Chen<zhanghao.c...@outlook.com> |
| Date | 07/18/2022 21:19 |
| To | SmileSmile<a511955...@163.com>、user<user@flink.apache.org> |
| Cc | |
| Subject | Re: flink on yarn job always restart |
Hi, could you provide the whole JM log?


Best,
Zhanghao Chen
From: SmileSmile <a511955...@163.com>
Sent: Monday, July 18, 2022 20:46
To: user <user@flink.apache.org>
Subject: flink on yarn job always restart
 
hi all
we meet a situation, parallelism 3000,the job contains multiple agg 
operation,the job recover from checkpoint or savepoint must be unrecoverable, 
the job restarts repeatedly
jm error logorg.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - 
RECEIVED S
IGNAL 15: SIGTERM. Shutting down as requested.
flink version 1.14.5
Have any good ideas for troubleshooting?





Reply via email to