[ https://issues.apache.org/jira/browse/FLINK-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210588#comment-17210588 ]
huweihua commented on FLINK-17341: ---------------------------------- I think we can put the slots that need to be free in a list, and then free them out of the loop. [~trohrmann] > freeSlot in TaskExecutor.closeJobManagerConnection cause > ConcurrentModificationException > ---------------------------------------------------------------------------------------- > > Key: FLINK-17341 > URL: https://issues.apache.org/jira/browse/FLINK-17341 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination > Affects Versions: 1.9.0, 1.10.2, 1.11.2 > Reporter: huweihua > Assignee: Matthias > Priority: Major > Labels: pull-request-available > Fix For: 1.12.0, 1.10.3, 1.11.3 > > > TaskExecutor may freeSlot when closeJobManagerConnection. freeSlot will > modify the TaskSlotTable.slotsPerJob. this modify will cause > ConcurrentModificationException. > {code:java} > Iterator<AllocationID> activeSlots = taskSlotTable.getActiveSlots(jobId); > final FlinkException freeingCause = new FlinkException("Slot could not be > marked inactive."); > while (activeSlots.hasNext()) { > AllocationID activeSlot = activeSlots.next(); > try { > if (!taskSlotTable.markSlotInactive(activeSlot, > taskManagerConfiguration.getTimeout())) { > freeSlotInternal(activeSlot, freeingCause); > } > } catch (SlotNotFoundException e) { > log.debug("Could not mark the slot {} inactive.", jobId, e); > } > } > {code} > error log: > {code:java} > 2020-04-21 23:37:11,363 ERROR org.apache.flink.runtime.rpc.akka.AkkaRpcActor > - Caught exception while executing runnable in main thread. > java.util.ConcurrentModificationException > at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437) > at java.util.HashMap$KeyIterator.next(HashMap.java:1461) > at > org.apache.flink.runtime.taskexecutor.slot.TaskSlotTable$TaskSlotIterator.hasNext(TaskSlotTable.java:698) > at > org.apache.flink.runtime.taskexecutor.slot.TaskSlotTable$AllocationIDIterator.hasNext(TaskSlotTable.java:652) > at > org.apache.flink.runtime.taskexecutor.TaskExecutor.closeJobManagerConnection(TaskExecutor.java:1314) > at > org.apache.flink.runtime.taskexecutor.TaskExecutor.access$1300(TaskExecutor.java:149) > at > org.apache.flink.runtime.taskexecutor.TaskExecutor$JobLeaderListenerImpl.lambda$jobManagerLostLeadership$1(TaskExecutor.java:1726) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:397) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:190) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)