[ 
https://issues.apache.org/jira/browse/FLINK-19391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvid Heise updated FLINK-19391:
--------------------------------
    Priority: Blocker  (was: Major)

> Deadlock during partition update
> --------------------------------
>
>                 Key: FLINK-19391
>                 URL: https://issues.apache.org/jira/browse/FLINK-19391
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Network
>    Affects Versions: 1.12.0
>            Reporter: Arvid Heise
>            Assignee: Arvid Heise
>            Priority: Blocker
>
> Master cron job is currently failing because of a deadlock introduced in 
> FLINK-19026.
> {noformat}
> 2020-09-23T21:50:39.2444176Z Found one Java-level deadlock:
> 2020-09-23T21:50:39.2444633Z =============================
> 2020-09-23T21:50:39.2445001Z "Temp writer":
> 2020-09-23T21:50:39.2445484Z   waiting to lock monitor 0x00007f4e14004ca8 
> (object 0x0000000086501948, a java.lang.Object),
> 2020-09-23T21:50:39.2446418Z   which is held by 
> "flink-akka.actor.default-dispatcher-2"
> 2020-09-23T21:50:39.2447193Z "flink-akka.actor.default-dispatcher-2":
> 2020-09-23T21:50:39.2447903Z   waiting to lock monitor 0x00007f4e14004bf8 
> (object 0x0000000086501930, a 
> org.apache.flink.runtime.io.network.partition.PrioritizedDeque),
> 2020-09-23T21:50:39.2448703Z   which is held by "Temp writer"
> 2020-09-23T21:50:39.2448965Z 
> 2020-09-23T21:50:39.2449384Z Java stack information for the threads listed 
> above:
> 2020-09-23T21:50:39.2449900Z 
> ===================================================
> 2020-09-23T21:50:39.2450325Z "Temp writer":
> 2020-09-23T21:50:39.2451050Z  at 
> org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.checkAndWaitForSubpartitionView(LocalInputChannel.java:244)
> 2020-09-23T21:50:39.2452264Z  - waiting to lock <0x0000000086501948> (a 
> java.lang.Object)
> 2020-09-23T21:50:39.2453183Z  at 
> org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.getNextBuffer(LocalInputChannel.java:205)
> 2020-09-23T21:50:39.2454173Z  at 
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.waitAndGetNextData(SingleInputGate.java:642)
> 2020-09-23T21:50:39.2455422Z  - locked <0x0000000086501930> (a 
> org.apache.flink.runtime.io.network.partition.PrioritizedDeque)
> 2020-09-23T21:50:39.2456310Z  at 
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:619)
> 2020-09-23T21:50:39.2457311Z  at 
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNext(SingleInputGate.java:602)
> 2020-09-23T21:50:39.2458205Z  at 
> org.apache.flink.runtime.taskmanager.InputGateWithMetrics.getNext(InputGateWithMetrics.java:105)
> 2020-09-23T21:50:39.2459258Z  at 
> org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:100)
> 2020-09-23T21:50:39.2460465Z  at 
> org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:47)
> 2020-09-23T21:50:39.2461344Z  at 
> org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:59)
> 2020-09-23T21:50:39.2462164Z  at 
> org.apache.flink.runtime.operators.TempBarrier$TempWritingThread.run(TempBarrier.java:178)
> 2020-09-23T21:50:39.2463418Z "flink-akka.actor.default-dispatcher-2":
> 2020-09-23T21:50:39.2464109Z  at 
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.queueChannel(SingleInputGate.java:825)
> 2020-09-23T21:50:39.2465336Z  - waiting to lock <0x0000000086501930> (a 
> org.apache.flink.runtime.io.network.partition.PrioritizedDeque)
> 2020-09-23T21:50:39.2466228Z  at 
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.notifyChannelNonEmpty(SingleInputGate.java:791)
> 2020-09-23T21:50:39.2467222Z  at 
> org.apache.flink.runtime.io.network.partition.consumer.InputChannel.notifyChannelNonEmpty(InputChannel.java:154)
> 2020-09-23T21:50:39.2468212Z  at 
> org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.notifyDataAvailable(LocalInputChannel.java:236)
> 2020-09-23T21:50:39.2469577Z  at 
> org.apache.flink.runtime.io.network.partition.ResultPartitionManager.createSubpartitionView(ResultPartitionManager.java:76)
> 2020-09-23T21:50:39.2470607Z  at 
> org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.requestSubpartition(LocalInputChannel.java:133)
> 2020-09-23T21:50:39.2471765Z  - locked <0x0000000086501948> (a 
> java.lang.Object)
> 2020-09-23T21:50:39.2472685Z  at 
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.updateInputChannel(SingleInputGate.java:489)
> 2020-09-23T21:50:39.2473727Z  - locked <0x0000000086532500> (a 
> java.lang.Object)
> 2020-09-23T21:50:39.2474449Z  at 
> org.apache.flink.runtime.io.network.NettyShuffleEnvironment.updatePartitionInfo(NettyShuffleEnvironment.java:279)
> 2020-09-23T21:50:39.2475394Z  at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$updatePartitions$12(TaskExecutor.java:758)
> 2020-09-23T21:50:39.2476235Z  at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor$$Lambda$406/1860601696.run(Unknown
>  Source)
> 2020-09-23T21:50:39.2476973Z  at 
> java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640)
> 2020-09-23T21:50:39.2477714Z  at 
> akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
> 2020-09-23T21:50:39.2478698Z  at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
> 2020-09-23T21:50:39.2479506Z  at 
> akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> 2020-09-23T21:50:39.2480263Z  at 
> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> 2020-09-23T21:50:39.2481018Z  at 
> akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> 2020-09-23T21:50:39.2481727Z  at 
> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 2020-09-23T21:50:39.2482192Z 
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to