[ https://issues.apache.org/jira/browse/FLINK-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14620782#comment-14620782 ]
Ufuk Celebi commented on FLINK-2341: ------------------------------------ Thanks for the stacktrace. I will look into it soon. The asynchronous variant is not used by default, so this does not affect any user until it's fixed. > Deadlock in SpilledSubpartitionViewAsyncIO > ------------------------------------------ > > Key: FLINK-2341 > URL: https://issues.apache.org/jira/browse/FLINK-2341 > Project: Flink > Issue Type: Bug > Components: Distributed Runtime > Affects Versions: 0.9, 0.10 > Reporter: Stephan Ewen > Assignee: Ufuk Celebi > Priority: Critical > Fix For: 0.9, 0.10 > > > It may be that the deadlock is because of the way the > {{SpilledSubpartitionViewTest}} is written > {code} > Found one Java-level deadlock: > ============================= > "pool-25-thread-2": > waiting to lock monitor 0x00007f66f4932468 (object 0x00000000fa1478f0, a > java.lang.Object), > which is held by "IOManager reader thread #1" > "IOManager reader thread #1": > waiting to lock monitor 0x00007f66f4931160 (object 0x00000000fa029768, a > java.lang.Object), > which is held by "pool-25-thread-2" > Java stack information for the threads listed above: > =================================================== > "pool-25-thread-2": > at > org.apache.flink.runtime.io.network.partition.SpilledSubpartitionViewAsyncIO.notifyError(SpilledSubpartitionViewAsyncIO.java:304) > - waiting to lock <0x00000000fa1478f0> (a java.lang.Object) > at > org.apache.flink.runtime.io.network.partition.SpilledSubpartitionViewAsyncIO.onAvailableBuffer(SpilledSubpartitionViewAsyncIO.java:256) > at > org.apache.flink.runtime.io.network.partition.SpilledSubpartitionViewAsyncIO.access$300(SpilledSubpartitionViewAsyncIO.java:42) > at > org.apache.flink.runtime.io.network.partition.SpilledSubpartitionViewAsyncIO$BufferProviderCallback.onEvent(SpilledSubpartitionViewAsyncIO.java:367) > at > org.apache.flink.runtime.io.network.partition.SpilledSubpartitionViewAsyncIO$BufferProviderCallback.onEvent(SpilledSubpartitionViewAsyncIO.java:353) > at > org.apache.flink.runtime.io.network.util.TestPooledBufferProvider$PooledBufferProviderRecycler.recycle(TestPooledBufferProvider.java:135) > - locked <0x00000000fa029768> (a java.lang.Object) > at > org.apache.flink.runtime.io.network.buffer.Buffer.recycle(Buffer.java:119) > - locked <0x00000000fa3a1a20> (a java.lang.Object) > at > org.apache.flink.runtime.io.network.util.TestSubpartitionConsumer.call(TestSubpartitionConsumer.java:95) > at > org.apache.flink.runtime.io.network.util.TestSubpartitionConsumer.call(TestSubpartitionConsumer.java:39) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:701) > "IOManager reader thread #1": > at > org.apache.flink.runtime.io.network.util.TestPooledBufferProvider$PooledBufferProviderRecycler.recycle(TestPooledBufferProvider.java:127) > - waiting to lock <0x00000000fa029768> (a java.lang.Object) > at > org.apache.flink.runtime.io.network.buffer.Buffer.recycle(Buffer.java:119) > - locked <0x00000000fa3a1ea0> (a java.lang.Object) > at > org.apache.flink.runtime.io.network.partition.SpilledSubpartitionViewAsyncIO.returnBufferFromIOThread(SpilledSubpartitionViewAsyncIO.java:270) > - locked <0x00000000fa1478f0> (a java.lang.Object) > at > org.apache.flink.runtime.io.network.partition.SpilledSubpartitionViewAsyncIO.access$100(SpilledSubpartitionViewAsyncIO.java:42) > at > org.apache.flink.runtime.io.network.partition.SpilledSubpartitionViewAsyncIO$IOThreadCallback.requestSuccessful(SpilledSubpartitionViewAsyncIO.java:338) > at > org.apache.flink.runtime.io.network.partition.SpilledSubpartitionViewAsyncIO$IOThreadCallback.requestSuccessful(SpilledSubpartitionViewAsyncIO.java:328) > at > org.apache.flink.runtime.io.disk.iomanager.AsynchronousFileIOChannel.handleProcessedBuffer(AsynchronousFileIOChannel.java:199) > at > org.apache.flink.runtime.io.disk.iomanager.BufferReadRequest.requestDone(AsynchronousFileIOChannel.java:431) > at > org.apache.flink.runtime.io.disk.iomanager.IOManagerAsync$ReaderThread.run(IOManagerAsync.java:377) > {code} > The full log with the deadlock stack traces can be found here: > https://s3.amazonaws.com/archive.travis-ci.org/jobs/70232347/log.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)