[ https://issues.apache.org/jira/browse/FLINK-9838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16549139#comment-16549139 ]
ASF GitHub Bot commented on FLINK-9838: --------------------------------------- GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/6373 [FLINK-9838][logging] Don't log slot request failures on the ResourceManager ## What is the purpose of the change Decrease log cluttering by not logging slot request failures on the `ResourceManager`. ## Verifying this change - Verified manually ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) - If yes, how is the feature documented? (not applicable) You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink fixSlotAllocationFailureLogging Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/6373.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6373 ---- commit f02f0231ba1fe047086fdf90227bf4dac9697d87 Author: Till Rohrmann <trohrmann@...> Date: 2018-07-19T11:07:44Z [FLINK-9838][logging] Don't log slot request failures on the ResourceManager ---- > Slot request failed Exceptions after completing a job > ----------------------------------------------------- > > Key: FLINK-9838 > URL: https://issues.apache.org/jira/browse/FLINK-9838 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination > Affects Versions: 1.5.1, 1.6.0 > Reporter: Nico Kruber > Assignee: Till Rohrmann > Priority: Major > Labels: pull-request-available > > Currently, after a job finished, e.g. the following one, several exceptions > are logged (at INFO level) about failed slot requests although the job has > run successfully. > {code} > StreamExecutionEnvironment env = > StreamExecutionEnvironment.getExecutionEnvironment(); > env.fromElements(1, 2, 3, 4).print(); > env.execute(); > {code} > {code} > 16:28:16,106 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Closing > the SlotManager. > 16:28:16,106 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - > Suspending the SlotManager. > 16:28:16,106 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - > Unregister TaskManager aa20e76adb9aee0cdadc50dbc06ea208 from the SlotManager. > 16:28:16,107 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Slot > request with allocation id f99ff6d66f7bc618a9ee6e9470e0cdb1 for job > 1bdaafd1072e210790790b99e7741b6a failed. > org.apache.flink.util.FlinkException: The assigned slot > b21f8807-5d0a-4e53-9e55-b6522b4a41c0_0 was removed. > at > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlot(SlotManager.java:786) > at > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlots(SlotManager.java:756) > at > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.internalUnregisterTaskManager(SlotManager.java:948) > at > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.unregisterTaskManager(SlotManager.java:372) > at > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.suspend(SlotManager.java:234) > at > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.close(SlotManager.java:251) > at > org.apache.flink.runtime.resourcemanager.ResourceManager.postStop(ResourceManager.java:224) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.postStop(AkkaRpcActor.java:105) > at > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.postStop(FencedAkkaRpcActor.java:40) > at akka.actor.Actor$class.aroundPostStop(Actor.scala:515) > at akka.actor.UntypedActor.aroundPostStop(UntypedActor.scala:95) > at > akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210) > at > akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172) > at akka.actor.ActorCell.terminate(ActorCell.scala:374) > at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:467) > at akka.actor.ActorCell.systemInvoke(ActorCell.scala:483) > at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:282) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:260) > at akka.dispatch.Mailbox.run(Mailbox.scala:224) > at akka.dispatch.Mailbox.exec(Mailbox.scala:234) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > 16:28:16,109 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor > - Stopping TaskExecutor akka://flink/user/taskmanager_0. > 16:28:16,110 INFO > org.apache.flink.runtime.state.TaskExecutorLocalStateStoresManager - > Shutting down TaskExecutorLocalStateStoresManager. > 16:28:16,109 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher > - Stopping dispatcher > akka://flink/user/dispatcher421f3c27-5248-40d4-b219-f0c23480bd6f. > 16:28:16,111 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher > - Stopping all currently running jobs of dispatcher > akka://flink/user/dispatcher421f3c27-5248-40d4-b219-f0c23480bd6f. > ... > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)