[ https://issues.apache.org/jira/browse/FLINK-15434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006260#comment-17006260 ]
hailong wang commented on FLINK-15434: -------------------------------------- I reproduced this problem on my laptop, and the debug log as follow: {code:java} [flink-akka.actor.default-dispatcher-4] INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started0 [flink-akka.actor.default-dispatcher-4] INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started72 [flink-akka.actor.default-dispatcher-4] DEBUG akka.event.EventStream - logger log1-Slf4jLogger started79 [flink-akka.actor.default-dispatcher-4] DEBUG akka.event.EventStream - Default Loggers started1165 [main] INFO org.apache.flink.runtime.jobmaster.JobMasterTest - ================================================================================Test testResourceManagerConnectionAfterRegainingLeadership(org.apache.flink.runtime.jobmaster.JobMasterTest) is running.--------------------------------------------------------------------------------4106 [main] INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.jobmaster.JobMaster at akka://flink/user/jobmanager_0 .4171 [main] INFO org.apache.flink.runtime.jobmaster.JobMaster - Initializing job (unnamed job) (f781d92dbe44f52505eff1b144e45bb6).4268 [main] INFO org.apache.flink.runtime.jobmaster.JobMaster - Using restart strategy NoRestartStrategy for (unnamed job) (f781d92dbe44f52505eff1b144e45bb6).4362 [main] INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job recovers via failover strategy: full graph restart4433 [main] INFO org.apache.flink.runtime.jobmaster.JobMaster - Running initialization on master for job (unnamed job) (f781d92dbe44f52505eff1b144e45bb6).4433 [main] INFO org.apache.flink.runtime.jobmaster.JobMaster - Successfully ran initialization on master in 0 ms.4439 [main] DEBUG org.apache.flink.runtime.jobmaster.JobMaster - Adding 0 vertices from job graph (unnamed job) (f781d92dbe44f52505eff1b144e45bb6).4439 [main] DEBUG org.apache.flink.runtime.executiongraph.ExecutionGraph - Attaching 0 topologically sorted vertices to existing job graph with 0 vertices and 0 intermediate results.4484 [main] DEBUG org.apache.flink.runtime.jobmaster.JobMaster - Successfully created execution graph from job graph (unnamed job) (f781d92dbe44f52505eff1b144e45bb6).4518 [flink-akka.actor.default-dispatcher-4] INFO org.apache.flink.runtime.jobmaster.JobMaster - Starting execution of job (unnamed job) (f781d92dbe44f52505eff1b144e45bb6) under job master id 5b6d296cfdedc261c4bbd525aea971b3.4521 [flink-akka.actor.default-dispatcher-4] INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job (unnamed job) (f781d92dbe44f52505eff1b144e45bb6) switched from state CREATED to RUNNING.4533 [flink-akka.actor.default-dispatcher-4] DEBUG org.apache.flink.runtime.jobmaster.JobMaster - Trigger heartbeat request.4537 [flink-akka.actor.default-dispatcher-4] DEBUG org.apache.flink.runtime.jobmaster.JobMaster - Trigger heartbeat request.4543 [flink-akka.actor.default-dispatcher-4] INFO org.apache.flink.runtime.jobmaster.JobMaster - Connecting to ResourceManager localhost/cc67ffdf-cb14-4f8a-bf85-ddd6f11327b2(5421d4878888c36920c6ee4eb0136801)4556 [flink-akka.actor.default-dispatcher-2] INFO org.apache.flink.runtime.jobmaster.JobMaster - Resolved ResourceManager address, beginning registration4557 [flink-akka.actor.default-dispatcher-2] INFO org.apache.flink.runtime.jobmaster.JobMaster - Registration at ResourceManager attempt 1 (timeout=100ms)4565 [flink-akka.actor.default-dispatcher-3] DEBUG org.apache.flink.runtime.jobmaster.JobMaster - Trigger heartbeat request.4594 [flink-akka.actor.default-dispatcher-3] DEBUG org.apache.flink.runtime.jobmaster.JobMaster - Trigger heartbeat request.into 5b6d296cfdedc261c4bbd525aea971b3, f781d92dbe44f52505eff1b144e45bb64619 [flink-akka.actor.default-dispatcher-3] INFO org.apache.flink.runtime.jobmaster.JobMaster - JobManager successfully registered at ResourceManager, leader id: 5421d4878888c36920c6ee4eb0136801.4629 [flink-akka.actor.default-dispatcher-3] DEBUG org.apache.flink.runtime.jobmaster.JobMaster - Trigger heartbeat request.4656 [flink-akka.actor.default-dispatcher-4] DEBUG org.apache.flink.runtime.jobmaster.JobMaster - Trigger heartbeat request.4657 [flink-akka.actor.default-dispatcher-4] INFO org.apache.flink.runtime.jobmaster.JobMaster - The heartbeat of ResourceManager with id 4cce21c1eac0cad2675e364a1d0f41c1 timed out.4668 [flink-akka.actor.default-dispatcher-4] DEBUG org.apache.flink.runtime.jobmaster.JobMaster - Close ResourceManager connection 4cce21c1eac0cad2675e364a1d0f41c1.org.apache.flink.runtime.jobmaster.JobMasterException: The heartbeat of ResourceManager with id 4cce21c1eac0cad2675e364a1d0f41c1 timed out. at org.apache.flink.runtime.jobmaster.JobMaster$ResourceManagerHeartbeatListener.notifyHeartbeatTimeout(JobMaster.java:1179) at org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl$HeartbeatMonitor.run(HeartbeatManagerImpl.java:318) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:397) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:190) at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) at akka.actor.Actor$class.aroundReceive(Actor.scala:517) at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) at akka.actor.ActorCell.invoke(ActorCell.scala:561) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) at akka.dispatch.Mailbox.run(Mailbox.scala:225) at akka.dispatch.Mailbox.exec(Mailbox.scala:235) at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)4679 [flink-akka.actor.default-dispatcher-4] INFO org.apache.flink.runtime.jobmaster.JobMaster - Connecting to ResourceManager localhost/cc67ffdf-cb14-4f8a-bf85-ddd6f11327b2(5421d4878888c36920c6ee4eb0136801)4682 [flink-akka.actor.default-dispatcher-6] INFO org.apache.flink.runtime.jobmaster.JobMaster - Resolved ResourceManager address, beginning registration4683 [flink-akka.actor.default-dispatcher-6] INFO org.apache.flink.runtime.jobmaster.JobMaster - Registration at ResourceManager attempt 1 (timeout=100ms)into 5b6d296cfdedc261c4bbd525aea971b3, f781d92dbe44f52505eff1b144e45bb64682 [flink-akka.actor.default-dispatcher-4] DEBUG org.apache.flink.runtime.jobmaster.JobMaster - Trigger heartbeat request.4685 [flink-akka.actor.default-dispatcher-4] INFO org.apache.flink.runtime.jobmaster.JobMaster - JobManager successfully registered at ResourceManager, leader id: 5421d4878888c36920c6ee4eb0136801.4686 [flink-akka.actor.default-dispatcher-4] DEBUG org.apache.flink.runtime.jobmaster.JobMaster - Trigger heartbeat request.4713 [flink-akka.actor.default-dispatcher-4] INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job (unnamed job) (f781d92dbe44f52505eff1b144e45bb6) switched from state RUNNING to SUSPENDED.org.apache.flink.util.FlinkException: Test exception. at org.apache.flink.runtime.jobmaster.JobMasterTest.testResourceManagerConnectionAfterRegainingLeadership(JobMasterTest.java:1025) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.junit.runner.JUnitCore.run(JUnitCore.java:137) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33) at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230) at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58)4730 [flink-akka.actor.default-dispatcher-4] INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job f781d92dbe44f52505eff1b144e45bb6 has been suspended.4731 [flink-akka.actor.default-dispatcher-4] INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl - Suspending SlotPool.4731 [flink-akka.actor.default-dispatcher-4] DEBUG org.apache.flink.runtime.jobmaster.JobMaster - Close ResourceManager connection 4cce21c1eac0cad2675e364a1d0f41c1.org.apache.flink.util.FlinkException: Test exception. at org.apache.flink.runtime.jobmaster.JobMasterTest.testResourceManagerConnectionAfterRegainingLeadership(JobMasterTest.java:1025) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.junit.runner.JUnitCore.run(JUnitCore.java:137) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33) at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230) at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58)4733 [flink-akka.actor.default-dispatcher-4] DEBUG org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor - Fencing token not set: Ignoring message LocalFencedMessage(5b6d296cfdedc261c4bbd525aea971b3, org.apache.flink.runtime.rpc.messages.RunAsync@4826bb87) because the fencing token is null.4739 [flink-akka.actor.default-dispatcher-4] DEBUG org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor - Fencing token not set: Ignoring message LocalFencedMessage(5b6d296cfdedc261c4bbd525aea971b3, org.apache.flink.runtime.rpc.messages.RunAsync@5ebd3007) because the fencing token is null.4740 [flink-akka.actor.default-dispatcher-4] DEBUG org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor - Fencing token not set: Ignoring message LocalFencedMessage(null, org.apache.flink.runtime.rpc.messages.RunAsync@53eb8b7a) because the fencing token is null.4744 [flink-akka.actor.default-dispatcher-4] INFO org.apache.flink.runtime.jobmaster.JobMaster - Starting execution of job (unnamed job) (f781d92dbe44f52505eff1b144e45bb6) under job master id 8db9e94add813aacf2794eb6881a1573.4747 [flink-akka.actor.default-dispatcher-4] INFO org.apache.flink.runtime.jobmaster.JobMaster - Using restart strategy NoRestartStrategy for (unnamed job) (f781d92dbe44f52505eff1b144e45bb6).4749 [flink-akka.actor.default-dispatcher-4] INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job recovers via failover strategy: full graph restart4749 [flink-akka.actor.default-dispatcher-4] INFO org.apache.flink.runtime.jobmaster.JobMaster - Running initialization on master for job (unnamed job) (f781d92dbe44f52505eff1b144e45bb6).4749 [flink-akka.actor.default-dispatcher-4] INFO org.apache.flink.runtime.jobmaster.JobMaster - Successfully ran initialization on master in 0 ms.4749 [flink-akka.actor.default-dispatcher-4] DEBUG org.apache.flink.runtime.jobmaster.JobMaster - Adding 0 vertices from job graph (unnamed job) (f781d92dbe44f52505eff1b144e45bb6).4750 [flink-akka.actor.default-dispatcher-4] DEBUG org.apache.flink.runtime.executiongraph.ExecutionGraph - Attaching 0 topologically sorted vertices to existing job graph with 0 vertices and 0 intermediate results.4750 [flink-akka.actor.default-dispatcher-4] DEBUG org.apache.flink.runtime.jobmaster.JobMaster - Successfully created execution graph from job graph (unnamed job) (f781d92dbe44f52505eff1b144e45bb6).4755 [flink-akka.actor.default-dispatcher-4] INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job (unnamed job) (f781d92dbe44f52505eff1b144e45bb6) switched from state CREATED to RUNNING.4756 [flink-akka.actor.default-dispatcher-4] DEBUG org.apache.flink.runtime.jobmaster.JobMaster - Trigger heartbeat request.5b6d296cfdedc261c4bbd525aea971b34758 [flink-akka.actor.default-dispatcher-4] INFO org.apache.flink.runtime.jobmaster.JobMaster - Connecting to ResourceManager localhost/cc67ffdf-cb14-4f8a-bf85-ddd6f11327b2(5421d4878888c36920c6ee4eb0136801)4758 [flink-akka.actor.default-dispatcher-4] DEBUG org.apache.flink.runtime.jobmaster.JobMaster - Trigger heartbeat request.4758 [flink-akka.actor.default-dispatcher-2] INFO org.apache.flink.runtime.jobmaster.JobMaster - Resolved ResourceManager address, beginning registration4758 [flink-akka.actor.default-dispatcher-2] INFO org.apache.flink.runtime.jobmaster.JobMaster - Registration at ResourceManager attempt 1 (timeout=100ms)5b6d296cfdedc261c4bbd525aea971b3into 8db9e94add813aacf2794eb6881a1573, f781d92dbe44f52505eff1b144e45bb64760 [flink-akka.actor.default-dispatcher-4] INFO org.apache.flink.runtime.jobmaster.JobMaster - JobManager successfully registered at ResourceManager, leader id: 5421d4878888c36920c6ee4eb0136801.4787 [flink-akka.actor.default-dispatcher-5] INFO org.apache.flink.runtime.jobmaster.JobMaster - The heartbeat of ResourceManager with id 4cce21c1eac0cad2675e364a1d0f41c1 timed out.4787 [flink-akka.actor.default-dispatcher-5] DEBUG org.apache.flink.runtime.jobmaster.JobMaster - Close ResourceManager connection 4cce21c1eac0cad2675e364a1d0f41c1.org.apache.flink.runtime.jobmaster.JobMasterException: The heartbeat of ResourceManager with id 4cce21c1eac0cad2675e364a1d0f41c1 timed out. at org.apache.flink.runtime.jobmaster.JobMaster$ResourceManagerHeartbeatListener.notifyHeartbeatTimeout(JobMaster.java:1179) at org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl$HeartbeatMonitor.run(HeartbeatManagerImpl.java:318) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:397) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:190) at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) at akka.actor.Actor$class.aroundReceive(Actor.scala:517) at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) at akka.actor.ActorCell.invoke(ActorCell.scala:561) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) at akka.dispatch.Mailbox.run(Mailbox.scala:225) at akka.dispatch.Mailbox.exec(Mailbox.scala:235) at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)4788 [flink-akka.actor.default-dispatcher-5] INFO org.apache.flink.runtime.jobmaster.JobMaster - Connecting to ResourceManager localhost/cc67ffdf-cb14-4f8a-bf85-ddd6f11327b2(5421d4878888c36920c6ee4eb0136801)4790 [flink-akka.actor.default-dispatcher-5] DEBUG org.apache.flink.runtime.jobmaster.JobMaster - Trigger heartbeat request.4790 [flink-akka.actor.default-dispatcher-7] INFO org.apache.flink.runtime.jobmaster.JobMaster - Resolved ResourceManager address, beginning registration4793 [flink-akka.actor.default-dispatcher-7] INFO org.apache.flink.runtime.jobmaster.JobMaster - Registration at ResourceManager attempt 1 (timeout=100ms)into 8db9e94add813aacf2794eb6881a1573, f781d92dbe44f52505eff1b144e45bb64796 [flink-akka.actor.default-dispatcher-5] INFO org.apache.flink.runtime.jobmaster.JobMaster - JobManager successfully registered at ResourceManager, leader id: 5421d4878888c36920c6ee4eb0136801.4796 [flink-akka.actor.default-dispatcher-5] INFO org.apache.flink.runtime.jobmaster.JobMaster - Stopping the JobMaster for job (unnamed job)(f781d92dbe44f52505eff1b144e45bb6).4798 [flink-akka.actor.default-dispatcher-5] INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job (unnamed job) (f781d92dbe44f52505eff1b144e45bb6) switched from state RUNNING to SUSPENDED.org.apache.flink.util.FlinkException: JobManager is shutting down. at org.apache.flink.runtime.jobmaster.JobMaster.onStop(JobMaster.java:347) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StartedState.terminate(AkkaRpcActor.java:509) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:175) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) at akka.actor.Actor$class.aroundReceive(Actor.scala:517) at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) at akka.actor.ActorCell.invoke(ActorCell.scala:561) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) at akka.dispatch.Mailbox.run(Mailbox.scala:225) at akka.dispatch.Mailbox.exec(Mailbox.scala:235) at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)4803 [flink-akka.actor.default-dispatcher-5] INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job f781d92dbe44f52505eff1b144e45bb6 has been suspended.4803 [flink-akka.actor.default-dispatcher-5] INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl - Suspending SlotPool.4804 [flink-akka.actor.default-dispatcher-5] DEBUG org.apache.flink.runtime.jobmaster.JobMaster - Close ResourceManager connection 4cce21c1eac0cad2675e364a1d0f41c1.org.apache.flink.util.FlinkException: JobManager is shutting down. at org.apache.flink.runtime.jobmaster.JobMaster.onStop(JobMaster.java:347) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StartedState.terminate(AkkaRpcActor.java:509) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:175) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) at akka.actor.Actor$class.aroundReceive(Actor.scala:517) at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) at akka.actor.ActorCell.invoke(ActorCell.scala:561) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) at akka.dispatch.Mailbox.run(Mailbox.scala:225) at akka.dispatch.Mailbox.exec(Mailbox.scala:235) at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)4805 [flink-akka.actor.default-dispatcher-5] INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl - Stopping SlotPool.4916 [main] ERROR org.apache.flink.runtime.jobmaster.JobMasterTest - --------------------------------------------------------------------------------Test testResourceManagerConnectionAfterRegainingLeadership(org.apache.flink.runtime.jobmaster.JobMasterTest) failed with:java.lang.AssertionError: Expected: <8db9e94add813aacf2794eb6881a1573> but: was <5b6d296cfdedc261c4bbd525aea971b3> at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:8) at org.apache.flink.runtime.jobmaster.JobMasterTest.testResourceManagerConnectionAfterRegainingLeadership(JobMasterTest.java:1034) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.junit.runner.JUnitCore.run(JUnitCore.java:137) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33) at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230) at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58) ================================================================================4942 [main] INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopping Akka RPC service. java.lang.AssertionError: Expected: <8db9e94add813aacf2794eb6881a1573> but: was <5b6d296cfdedc261c4bbd525aea971b3>Expected :<8db9e94add813aacf2794eb6881a1573>Actual :<5b6d296cfdedc261c4bbd525aea971b3><Click to see difference> at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:8) at org.apache.flink.runtime.jobmaster.JobMasterTest.testResourceManagerConnectionAfterRegainingLeadership(JobMasterTest.java:1034) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.junit.runner.JUnitCore.run(JUnitCore.java:137) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33) at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230) at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58)Process finished with exit code 255{code} According to the log, for the lost heartbeat of ResourceManager, It will reconnect. So the registrationQueue's element will be old JobMasterId. So I think we can look if registrationQueue is empty, if so, we delete it and offer new value then. {code:java} testingResourceManagerGateway.setRegisterJobManagerConsumer( jobMasterIdResourceIDStringJobIDTuple4 -> { if (!registrationQueue.isEmpty()) { registrationQueue.poll(); } registrationQueue.offer(jobMasterIdResourceIDStringJobIDTuple4.f0); }); {code} > testResourceManagerConnectionAfterRegainingLeadership test fail when run > azure > ------------------------------------------------------------------------------- > > Key: FLINK-15434 > URL: https://issues.apache.org/jira/browse/FLINK-15434 > Project: Flink > Issue Type: Bug > Components: Tests > Affects Versions: 1.9.1 > Reporter: hailong wang > Priority: Major > Fix For: 1.10.0 > > > Error message > Expected: <b65797325db6323dd5f8bdfeeaa18467> > but: was <57f975298d116a9d7623f5a844ce6502> > > Stack trace > java.lang.AssertionError: Expected: <b65797325db6323dd5f8bdfeeaa18467> but: > was <57f975298d116a9d7623f5a844ce6502> at > org.apache.flink.runtime.jobmaster.JobMasterTest.testResourceManagerConnectionAfterRegainingLeadership(JobMasterTest.java:1033) -- This message was sent by Atlassian Jira (v8.3.4#803005)