[ https://issues.apache.org/jira/browse/FLINK-19391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arvid Heise updated FLINK-19391: -------------------------------- Priority: Blocker (was: Major) > Deadlock during partition update > -------------------------------- > > Key: FLINK-19391 > URL: https://issues.apache.org/jira/browse/FLINK-19391 > Project: Flink > Issue Type: Bug > Components: Runtime / Network > Affects Versions: 1.12.0 > Reporter: Arvid Heise > Assignee: Arvid Heise > Priority: Blocker > > Master cron job is currently failing because of a deadlock introduced in > FLINK-19026. > {noformat} > 2020-09-23T21:50:39.2444176Z Found one Java-level deadlock: > 2020-09-23T21:50:39.2444633Z ============================= > 2020-09-23T21:50:39.2445001Z "Temp writer": > 2020-09-23T21:50:39.2445484Z waiting to lock monitor 0x00007f4e14004ca8 > (object 0x0000000086501948, a java.lang.Object), > 2020-09-23T21:50:39.2446418Z which is held by > "flink-akka.actor.default-dispatcher-2" > 2020-09-23T21:50:39.2447193Z "flink-akka.actor.default-dispatcher-2": > 2020-09-23T21:50:39.2447903Z waiting to lock monitor 0x00007f4e14004bf8 > (object 0x0000000086501930, a > org.apache.flink.runtime.io.network.partition.PrioritizedDeque), > 2020-09-23T21:50:39.2448703Z which is held by "Temp writer" > 2020-09-23T21:50:39.2448965Z > 2020-09-23T21:50:39.2449384Z Java stack information for the threads listed > above: > 2020-09-23T21:50:39.2449900Z > =================================================== > 2020-09-23T21:50:39.2450325Z "Temp writer": > 2020-09-23T21:50:39.2451050Z at > org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.checkAndWaitForSubpartitionView(LocalInputChannel.java:244) > 2020-09-23T21:50:39.2452264Z - waiting to lock <0x0000000086501948> (a > java.lang.Object) > 2020-09-23T21:50:39.2453183Z at > org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.getNextBuffer(LocalInputChannel.java:205) > 2020-09-23T21:50:39.2454173Z at > org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.waitAndGetNextData(SingleInputGate.java:642) > 2020-09-23T21:50:39.2455422Z - locked <0x0000000086501930> (a > org.apache.flink.runtime.io.network.partition.PrioritizedDeque) > 2020-09-23T21:50:39.2456310Z at > org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:619) > 2020-09-23T21:50:39.2457311Z at > org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNext(SingleInputGate.java:602) > 2020-09-23T21:50:39.2458205Z at > org.apache.flink.runtime.taskmanager.InputGateWithMetrics.getNext(InputGateWithMetrics.java:105) > 2020-09-23T21:50:39.2459258Z at > org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:100) > 2020-09-23T21:50:39.2460465Z at > org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:47) > 2020-09-23T21:50:39.2461344Z at > org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:59) > 2020-09-23T21:50:39.2462164Z at > org.apache.flink.runtime.operators.TempBarrier$TempWritingThread.run(TempBarrier.java:178) > 2020-09-23T21:50:39.2463418Z "flink-akka.actor.default-dispatcher-2": > 2020-09-23T21:50:39.2464109Z at > org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.queueChannel(SingleInputGate.java:825) > 2020-09-23T21:50:39.2465336Z - waiting to lock <0x0000000086501930> (a > org.apache.flink.runtime.io.network.partition.PrioritizedDeque) > 2020-09-23T21:50:39.2466228Z at > org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.notifyChannelNonEmpty(SingleInputGate.java:791) > 2020-09-23T21:50:39.2467222Z at > org.apache.flink.runtime.io.network.partition.consumer.InputChannel.notifyChannelNonEmpty(InputChannel.java:154) > 2020-09-23T21:50:39.2468212Z at > org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.notifyDataAvailable(LocalInputChannel.java:236) > 2020-09-23T21:50:39.2469577Z at > org.apache.flink.runtime.io.network.partition.ResultPartitionManager.createSubpartitionView(ResultPartitionManager.java:76) > 2020-09-23T21:50:39.2470607Z at > org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.requestSubpartition(LocalInputChannel.java:133) > 2020-09-23T21:50:39.2471765Z - locked <0x0000000086501948> (a > java.lang.Object) > 2020-09-23T21:50:39.2472685Z at > org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.updateInputChannel(SingleInputGate.java:489) > 2020-09-23T21:50:39.2473727Z - locked <0x0000000086532500> (a > java.lang.Object) > 2020-09-23T21:50:39.2474449Z at > org.apache.flink.runtime.io.network.NettyShuffleEnvironment.updatePartitionInfo(NettyShuffleEnvironment.java:279) > 2020-09-23T21:50:39.2475394Z at > org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$updatePartitions$12(TaskExecutor.java:758) > 2020-09-23T21:50:39.2476235Z at > org.apache.flink.runtime.taskexecutor.TaskExecutor$$Lambda$406/1860601696.run(Unknown > Source) > 2020-09-23T21:50:39.2476973Z at > java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640) > 2020-09-23T21:50:39.2477714Z at > akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) > 2020-09-23T21:50:39.2478698Z at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44) > 2020-09-23T21:50:39.2479506Z at > akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > 2020-09-23T21:50:39.2480263Z at > akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > 2020-09-23T21:50:39.2481018Z at > akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > 2020-09-23T21:50:39.2481727Z at > akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > 2020-09-23T21:50:39.2482192Z > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)