[ https://issues.apache.org/jira/browse/FLINK-22663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346568#comment-17346568 ]
Yang Wang commented on FLINK-22663: ----------------------------------- Thanks for sharing more information. Only when the JobManager and at least one TaskManager are running on the dead NodeManager, then this issue occurs. The root cause is {{NMClientImpl#stop}} is trying to cleaning up all running containers, including the ones on the dead NodeManager. Actually, I am afraid it is not a bug of Flink. Because Flink has deregistered the application successfully, but Yarn did not kill all the containers. A possible improvement might be not cleaning up the running containers when stopping NM client. However, it also has side effects. Moreover, the reason why "yarn application -kill appid" could work is the JobManager received a shutdown request from YARN ResourceManager and then kill itself. I believe you could find some logs like "ResourceManagerException: Received shutdown request from YARN ResourceManager". > Release YARN resource very slow when cancel the job after some NodeManagers > shutdown > ------------------------------------------------------------------------------------ > > Key: FLINK-22663 > URL: https://issues.apache.org/jira/browse/FLINK-22663 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN > Affects Versions: 1.12.2 > Reporter: Jinhong Liu > Priority: Major > Labels: YARN > > When I test flink on YARN, there is a case that may cause some problems. > Hadoop Version: 2.7.3 > Flink Version: 1.12.2 > I deploy a flink job on YARN, when the job is running I stop one NodeManager, > after one or two minutes, the job is auto recovered. But in this situation, > if I cancel the job, the containers cannot be released immediately, there are > still some containers that are running include the app master. About 5 > minutes later, these containers exit, and about 10 minutes later the app > master exit. > I check the log of app master, seems it try to stop the containers on the > NodeManger which I have already stopped. > {code:java} > 2021-05-14 06:15:17,389 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Job class > tv.freewheel.reporting.fastlane.Fastlane$ (da883ab39a7a82e4d45a3803bc77dd6f) > switched from state CANCELLING to CANCELED. > 2021-05-14 06:15:17,389 INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Stopping > checkpoint coordinator for job da883ab39a7a82e4d45a3803bc77dd6f. > 2021-05-14 06:15:17,390 INFO > org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore [] - > Shutting down > 2021-05-14 06:15:17,408 INFO > org.apache.flink.runtime.dispatcher.MiniDispatcher [] - Job > da883ab39a7a82e4d45a3803bc77dd6f reached globally terminal state CANCELED. > 2021-05-14 06:15:17,409 INFO > org.apache.flink.runtime.dispatcher.MiniDispatcher [] - Shutting > down cluster with state CANCELED, jobCancelled: true, executionMode: DETACHED > 2021-05-14 06:15:17,409 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Shutting > YarnJobClusterEntrypoint down with application status CANCELED. Diagnostics > null. > 2021-05-14 06:15:17,409 INFO > org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint [] - Shutting > down rest endpoint. > 2021-05-14 06:15:17,420 INFO org.apache.flink.runtime.jobmaster.JobMaster > [] - Stopping the JobMaster for job class > tv.freewheel.reporting.fastlane.Fastlane$(da883ab39a7a82e4d45a3803bc77dd6f). > 2021-05-14 06:15:17,422 INFO > org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint [] - Removing > cache directory > /tmp/flink-web-af72a00c-0ddd-4e5e-a62c-8244d6caa552/flink-web-ui > 2021-05-14 06:15:17,432 INFO > org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint [] - > http://ip-10-23-19-197.ec2.internal:43811 lost leadership > 2021-05-14 06:15:17,432 INFO > org.apache.flink.runtime.jobmaster.MiniDispatcherRestEndpoint [] - Shut down > complete. > 2021-05-14 06:15:17,436 INFO > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > Shut down cluster because application is in CANCELED, diagnostics null. > 2021-05-14 06:15:17,436 INFO org.apache.flink.yarn.YarnResourceManagerDriver > [] - Unregister application from the YARN Resource Manager with > final status KILLED. > 2021-05-14 06:15:17,458 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Suspending > SlotPool. > 2021-05-14 06:15:17,458 INFO org.apache.flink.runtime.jobmaster.JobMaster > [] - Close ResourceManager connection > 493862ba148679a4f16f7de5ffaef665: Stopping JobMaster for job class > tv.freewheel.reporting.fastlane.Fastlane$(da883ab39a7a82e4d45a3803bc77dd6f).. > 2021-05-14 06:15:17,458 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Stopping > SlotPool. > 2021-05-14 06:15:17,482 INFO > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl [] - Waiting for > application to be successfully unregistered. > 2021-05-14 06:15:17,566 INFO org.apache.flink.runtime.history.FsJobArchivist > [] - Job da883ab39a7a82e4d45a3803bc77dd6f has been archived at > hdfs:/realtime/flink-archive/da883ab39a7a82e4d45a3803bc77dd6f. > 2021-05-14 06:15:17,589 INFO > org.apache.flink.runtime.entrypoint.component.DispatcherResourceManagerComponent > [] - Closing components. > 2021-05-14 06:15:17,590 INFO > org.apache.flink.runtime.dispatcher.runner.JobDispatcherLeaderProcess [] - > Stopping JobDispatcherLeaderProcess. > 2021-05-14 06:15:17,590 INFO > org.apache.flink.runtime.dispatcher.MiniDispatcher [] - Stopping > dispatcher > akka.tcp://flink@ip-10-23-19-197.ec2.internal:40340/user/rpc/dispatcher_1. > 2021-05-14 06:15:17,590 INFO > org.apache.flink.runtime.dispatcher.MiniDispatcher [] - Stopping > all currently running jobs of dispatcher > akka.tcp://flink@ip-10-23-19-197.ec2.internal:40340/user/rpc/dispatcher_1. > 2021-05-14 06:15:17,591 INFO > org.apache.flink.runtime.rest.handler.legacy.backpressure.BackPressureRequestCoordinator > [] - Shutting down back pressure request coordinator. > 2021-05-14 06:15:17,591 INFO > org.apache.flink.runtime.dispatcher.MiniDispatcher [] - Stopped > dispatcher > akka.tcp://flink@ip-10-23-19-197.ec2.internal:40340/user/rpc/dispatcher_1. > 2021-05-14 06:15:17,594 INFO > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > Disconnect job manager > 00000000000000000000000000000...@akka.tcp://flink@ip-10-23-19-197.ec2.internal:40340/user/rpc/jobmanager_2 > for job da883ab39a7a82e4d45a3803bc77dd6f from the resource manager. > 2021-05-14 06:15:17,600 INFO > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl [] - > Interrupted while waiting for queue > java.lang.InterruptedException: null > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014) > ~[?:1.8.0_161] > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048) > ~[?:1.8.0_161] > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > ~[?:1.8.0_161] > at > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:287) > [flink-shaded-hadoop-2-uber-2.7.5-7.0.jar:2.7.5-7.0] > 2021-05-14 06:15:17,648 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-27-242.ec2.internal:41435 > 2021-05-14 06:15:17,699 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-28-67.ec2.internal:38916 > 2021-05-14 06:15:17,741 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-24-139.ec2.internal:42226 > 2021-05-14 06:15:17,796 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-24-71.ec2.internal:44804 > 2021-05-14 06:15:17,809 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-27-242.ec2.internal:41435 > 2021-05-14 06:15:17,813 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-28-67.ec2.internal:38916 > 2021-05-14 06:15:17,817 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-27-242.ec2.internal:41435 > 2021-05-14 06:15:17,822 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-24-71.ec2.internal:44804 > 2021-05-14 06:15:17,879 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-19-86.ec2.internal:45099 > 2021-05-14 06:15:17,889 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-19-197.ec2.internal:44443 > 2021-05-14 06:15:17,898 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-24-139.ec2.internal:42226 > 2021-05-14 06:15:17,903 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-24-139.ec2.internal:42226 > 2021-05-14 06:15:17,907 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-19-86.ec2.internal:45099 > 2021-05-14 06:15:17,911 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-28-67.ec2.internal:38916 > 2021-05-14 06:15:17,960 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-25-241.ec2.internal:42723 > 2021-05-14 06:15:17,964 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink@ip-10-23-27-242.ec2.internal:43814] has failed, address is > now gated for [50] ms. Reason: [Disassociated] > 2021-05-14 06:15:17,964 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink-metrics@ip-10-23-27-242.ec2.internal:38826] has failed, > address is now gated for [50] ms. Reason: [Disassociated] > 2021-05-14 06:15:18,016 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink@ip-10-23-28-67.ec2.internal:45022] has failed, address is > now gated for [50] ms. Reason: [Disassociated] > 2021-05-14 06:15:18,016 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink-metrics@ip-10-23-28-67.ec2.internal:40808] has failed, > address is now gated for [50] ms. Reason: [Disassociated] > 2021-05-14 06:15:18,061 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink@ip-10-23-24-139.ec2.internal:33912] has failed, address is > now gated for [50] ms. Reason: [Disassociated] > 2021-05-14 06:15:18,061 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink-metrics@ip-10-23-24-139.ec2.internal:44652] has failed, > address is now gated for [50] ms. Reason: [Disassociated] > 2021-05-14 06:15:18,120 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink-metrics@ip-10-23-24-71.ec2.internal:42454] has failed, > address is now gated for [50] ms. Reason: [Disassociated] > 2021-05-14 06:15:18,120 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink@ip-10-23-24-71.ec2.internal:41756] has failed, address is > now gated for [50] ms. Reason: [Disassociated] > 2021-05-14 06:15:18,125 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink@ip-10-23-27-242.ec2.internal:36652] has failed, address is > now gated for [50] ms. Reason: [Disassociated] > 2021-05-14 06:15:18,125 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink-metrics@ip-10-23-27-242.ec2.internal:37709] has failed, > address is now gated for [50] ms. Reason: [Disassociated] > 2021-05-14 06:15:18,126 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink@ip-10-23-28-67.ec2.internal:40308] has failed, address is > now gated for [50] ms. Reason: [Disassociated] > 2021-05-14 06:15:18,126 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink-metrics@ip-10-23-28-67.ec2.internal:34524] has failed, > address is now gated for [50] ms. Reason: [Disassociated] > 2021-05-14 06:15:18,141 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink-metrics@ip-10-23-27-242.ec2.internal:44435] has failed, > address is now gated for [50] ms. Reason: [Disassociated] > 2021-05-14 06:15:18,141 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink@ip-10-23-27-242.ec2.internal:37224] has failed, address is > now gated for [50] ms. Reason: [Disassociated] > 2021-05-14 06:15:18,143 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink@ip-10-23-24-71.ec2.internal:38940] has failed, address is > now gated for [50] ms. Reason: [Disassociated] > 2021-05-14 06:15:18,143 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink-metrics@ip-10-23-24-71.ec2.internal:33014] has failed, > address is now gated for [50] ms. Reason: [Disassociated] > 2021-05-14 06:15:18,202 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink-metrics@ip-10-23-19-86.ec2.internal:39939] has failed, > address is now gated for [50] ms. Reason: [Disassociated] > 2021-05-14 06:15:18,202 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink@ip-10-23-19-86.ec2.internal:35165] has failed, address is > now gated for [50] ms. Reason: [Disassociated] > 2021-05-14 06:15:18,204 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink@ip-10-23-19-197.ec2.internal:45913] has failed, address is > now gated for [50] ms. Reason: [Disassociated] > 2021-05-14 06:15:18,204 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink-metrics@ip-10-23-19-197.ec2.internal:36333] has failed, > address is now gated for [50] ms. Reason: [Disassociated] > 2021-05-14 06:15:18,220 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink-metrics@ip-10-23-28-67.ec2.internal:35366] has failed, > address is now gated for [50] ms. Reason: [Disassociated] > 2021-05-14 06:15:18,220 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink@ip-10-23-28-67.ec2.internal:45411] has failed, address is > now gated for [50] ms. Reason: [Disassociated] > 2021-05-14 06:15:18,223 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink-metrics@ip-10-23-24-139.ec2.internal:34759] has failed, > address is now gated for [50] ms. Reason: [Disassociated] > 2021-05-14 06:15:18,223 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink@ip-10-23-24-139.ec2.internal:42621] has failed, address is > now gated for [50] ms. Reason: [Disassociated] > 2021-05-14 06:15:18,228 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink@ip-10-23-19-86.ec2.internal:40782] has failed, address is > now gated for [50] ms. Reason: [Disassociated] > 2021-05-14 06:15:18,228 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink-metrics@ip-10-23-19-86.ec2.internal:36612] has failed, > address is now gated for [50] ms. Reason: [Disassociated] > 2021-05-14 06:15:18,251 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink-metrics@ip-10-23-24-139.ec2.internal:38342] has failed, > address is now gated for [50] ms. Reason: [Disassociated] > 2021-05-14 06:15:18,251 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink@ip-10-23-24-139.ec2.internal:36176] has failed, address is > now gated for [50] ms. Reason: [Disassociated] > 2021-05-14 06:18:18,220 ERROR > org.apache.hadoop.yarn.client.api.impl.NMClientImpl [] - Failed to > stop Container container_1620970870707_0001_01_000057when stopping > NMClientImpl > 2021-05-14 06:18:18,240 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-19-86.ec2.internal:45099 > 2021-05-14 06:18:18,295 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-24-71.ec2.internal:44804 > 2021-05-14 06:18:18,299 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-25-241.ec2.internal:42723 > 2021-05-14 06:18:18,557 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink@ip-10-23-19-86.ec2.internal:44399] has failed, address is > now gated for [50] ms. Reason: [Disassociated] > 2021-05-14 06:18:18,557 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink-metrics@ip-10-23-19-86.ec2.internal:33246] has failed, > address is now gated for [50] ms. Reason: [Disassociated] > 2021-05-14 06:18:18,611 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink@ip-10-23-24-71.ec2.internal:39100] has failed, address is > now gated for [50] ms. Reason: [Disassociated] > 2021-05-14 06:18:18,611 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink-metrics@ip-10-23-24-71.ec2.internal:35428] has failed, > address is now gated for [50] ms. Reason: [Disassociated] > 2021-05-14 06:21:01,684 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink@ip-10-23-27-242.ec2.internal:41510] has failed, address is > now gated for [50] ms. Reason: [Disassociated] > 2021-05-14 06:21:01,730 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink@ip-10-23-19-197.ec2.internal:39595] has failed, address is > now gated for [50] ms. Reason: [Disassociated] > 2021-05-14 06:21:01,741 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink@ip-10-23-24-71.ec2.internal:46788] has failed, address is > now gated for [50] ms. Reason: [Disassociated] > 2021-05-14 06:21:01,754 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink-metrics@ip-10-23-24-71.ec2.internal:46748] has failed, > address is now gated for [50] ms. Reason: [Disassociated] > 2021-05-14 06:21:01,754 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink-metrics@ip-10-23-24-71.ec2.internal:34218] has failed, > address is now gated for [50] ms. Reason: [Disassociated] > 2021-05-14 06:21:01,761 WARN akka.remote.ReliableDeliverySupervisor > [] - Association with remote system > [akka.tcp://flink@ip-10-23-19-86.ec2.internal:42730] has failed, address is > now gated for [50] ms. Reason: [Disassociated] > 2021-05-14 06:21:18,522 ERROR > org.apache.hadoop.yarn.client.api.impl.NMClientImpl [] - Failed to > stop Container container_1620970870707_0001_01_000078when stopping > NMClientImpl > 2021-05-14 06:21:18,567 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-28-67.ec2.internal:38916 > 2021-05-14 06:21:18,571 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-19-197.ec2.internal:44443 > 2021-05-14 06:21:18,605 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-27-242.ec2.internal:41435 > 2021-05-14 06:21:18,657 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-24-71.ec2.internal:44804 > 2021-05-14 06:21:18,698 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-19-86.ec2.internal:45099 > 2021-05-14 06:21:18,702 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-27-242.ec2.internal:41435 > 2021-05-14 06:21:18,705 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-19-197.ec2.internal:44443 > 2021-05-14 06:21:18,707 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-25-241.ec2.internal:42723 > 2021-05-14 06:24:18,934 ERROR > org.apache.hadoop.yarn.client.api.impl.NMClientImpl [] - Failed to > stop Container container_1620970870707_0001_01_000008when stopping > NMClientImpl > 2021-05-14 06:24:18,986 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-19-86.ec2.internal:45099 > 2021-05-14 06:24:19,036 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-24-139.ec2.internal:42226 > 2021-05-14 06:24:19,077 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-24-71.ec2.internal:44804 > 2021-05-14 06:24:19,080 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-24-139.ec2.internal:42226 > 2021-05-14 06:24:19,083 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-25-241.ec2.internal:42723 > 2021-05-14 06:27:19,303 ERROR > org.apache.hadoop.yarn.client.api.impl.NMClientImpl [] - Failed to > stop Container container_1620970870707_0001_01_000029when stopping > NMClientImpl > 2021-05-14 06:27:19,349 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-28-67.ec2.internal:38916 > 2021-05-14 06:27:19,353 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-28-67.ec2.internal:38916 > 2021-05-14 06:27:19,402 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-24-71.ec2.internal:44804 > 2021-05-14 06:27:19,466 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-24-139.ec2.internal:42226 > 2021-05-14 06:27:19,470 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-24-71.ec2.internal:44804 > 2021-05-14 06:27:19,504 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-27-242.ec2.internal:41435 > 2021-05-14 06:27:19,508 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-27-242.ec2.internal:41435 > 2021-05-14 06:27:19,510 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-19-197.ec2.internal:44443 > 2021-05-14 06:27:19,545 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-19-86.ec2.internal:45099 > 2021-05-14 06:27:19,548 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-24-71.ec2.internal:44804 > 2021-05-14 06:27:19,551 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-19-86.ec2.internal:45099 > 2021-05-14 06:27:19,554 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-24-139.ec2.internal:42226 > 2021-05-14 06:27:19,557 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-19-197.ec2.internal:44443 > 2021-05-14 06:27:19,559 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy [] - > Opening proxy : ip-10-23-25-241.ec2.internal:42723 > 2021-05-14 06:27:50,793 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - RECEIVED > SIGNAL 15: SIGTERM. Shutting down as requested. > 2021-05-14 06:27:50,794 INFO org.apache.flink.runtime.blob.BlobServer > [] - Stopped BLOB server at 0.0.0.0:44447 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)