huweihua created FLINK-17341: -------------------------------- Summary: freeSlot in TaskExecutor.closeJobManagerConnection cause ConcurrentModificationException Key: FLINK-17341 URL: https://issues.apache.org/jira/browse/FLINK-17341 Project: Flink Issue Type: Bug Reporter: huweihua
TaskExecutor may freeSlot when closeJobManagerConnection. freeSlot will modify the TaskSlotTable.slotsPerJob. this modify will cause ConcurrentModificationException. {code:java} Iterator<AllocationID> activeSlots = taskSlotTable.getActiveSlots(jobId); final FlinkException freeingCause = new FlinkException("Slot could not be marked inactive."); while (activeSlots.hasNext()) { AllocationID activeSlot = activeSlots.next(); try { if (!taskSlotTable.markSlotInactive(activeSlot, taskManagerConfiguration.getTimeout())) { freeSlotInternal(activeSlot, freeingCause); } } catch (SlotNotFoundException e) { log.debug("Could not mark the slot {} inactive.", jobId, e); } } {code} error log: {code:java} 2020-04-21 23:37:11,363 ERROR org.apache.flink.runtime.rpc.akka.AkkaRpcActor - Caught exception while executing runnable in main thread. java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437) at java.util.HashMap$KeyIterator.next(HashMap.java:1461) at org.apache.flink.runtime.taskexecutor.slot.TaskSlotTable$TaskSlotIterator.hasNext(TaskSlotTable.java:698) at org.apache.flink.runtime.taskexecutor.slot.TaskSlotTable$AllocationIDIterator.hasNext(TaskSlotTable.java:652) at org.apache.flink.runtime.taskexecutor.TaskExecutor.closeJobManagerConnection(TaskExecutor.java:1314) at org.apache.flink.runtime.taskexecutor.TaskExecutor.access$1300(TaskExecutor.java:149) at org.apache.flink.runtime.taskexecutor.TaskExecutor$JobLeaderListenerImpl.lambda$jobManagerLostLeadership$1(TaskExecutor.java:1726) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:397) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:190) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)