Stephan Ewen created FLINK-4543: ----------------------------------- Summary: Race Deadlock in SpilledSubpartitionViewTest Key: FLINK-4543 URL: https://issues.apache.org/jira/browse/FLINK-4543 Project: Flink Issue Type: Improvement Components: Network Affects Versions: 1.1.2 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.2.0
The test deadlocked (Java level deadlock) with the following stack traces: {code} Found one Java-level deadlock: ============================= "pool-1-thread-2": waiting to lock monitor 0x00007fec2c006168 (object 0x00000000ef661c20, a java.lang.Object), which is held by "IOManager reader thread #1" "IOManager reader thread #1": waiting to lock monitor 0x00007fec2c005ea8 (object 0x00000000ef62c8a8, a java.lang.Object), which is held by "pool-1-thread-2" Java stack information for the threads listed above: =================================================== "pool-1-thread-2": at org.apache.flink.runtime.io.network.partition.SpilledSubpartitionViewAsyncIO.notifyError(SpilledSubpartitionViewAsyncIO.java:309) - waiting to lock <0x00000000ef661c20> (a java.lang.Object) at org.apache.flink.runtime.io.network.partition.SpilledSubpartitionViewAsyncIO.onAvailableBuffer(SpilledSubpartitionViewAsyncIO.java:261) at org.apache.flink.runtime.io.network.partition.SpilledSubpartitionViewAsyncIO.access$300(SpilledSubpartitionViewAsyncIO.java:42) at org.apache.flink.runtime.io.network.partition.SpilledSubpartitionViewAsyncIO$BufferProviderCallback.onEvent(SpilledSubpartitionViewAsyncIO.java:380) at org.apache.flink.runtime.io.network.partition.SpilledSubpartitionViewAsyncIO$BufferProviderCallback.onEvent(SpilledSubpartitionViewAsyncIO.java:366) at org.apache.flink.runtime.io.network.util.TestPooledBufferProvider$PooledBufferProviderRecycler.recycle(TestPooledBufferProvider.java:135) - locked <0x00000000ef62c8a8> (a java.lang.Object) at org.apache.flink.runtime.io.network.buffer.Buffer.recycle(Buffer.java:118) - locked <0x00000000ef9597c0> (a java.lang.Object) at org.apache.flink.runtime.io.network.util.TestConsumerCallback$RecyclingCallback.onBuffer(TestConsumerCallback.java:72) at org.apache.flink.runtime.io.network.util.TestSubpartitionConsumer.call(TestSubpartitionConsumer.java:87) at org.apache.flink.runtime.io.network.util.TestSubpartitionConsumer.call(TestSubpartitionConsumer.java:39) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) "IOManager reader thread #1": at org.apache.flink.runtime.io.network.util.TestPooledBufferProvider$PooledBufferProviderRecycler.recycle(TestPooledBufferProvider.java:126) - waiting to lock <0x00000000ef62c8a8> (a java.lang.Object) at org.apache.flink.runtime.io.network.buffer.Buffer.recycle(Buffer.java:118) - locked <0x00000000efa016f0> (a java.lang.Object) at org.apache.flink.runtime.io.network.partition.SpilledSubpartitionViewAsyncIO.returnBufferFromIOThread(SpilledSubpartitionViewAsyncIO.java:275) - locked <0x00000000ef661c20> (a java.lang.Object) at org.apache.flink.runtime.io.network.partition.SpilledSubpartitionViewAsyncIO.access$100(SpilledSubpartitionViewAsyncIO.java:42) at org.apache.flink.runtime.io.network.partition.SpilledSubpartitionViewAsyncIO$IOThreadCallback.requestSuccessful(SpilledSubpartitionViewAsyncIO.java:343) at org.apache.flink.runtime.io.network.partition.SpilledSubpartitionViewAsyncIO$IOThreadCallback.requestSuccessful(SpilledSubpartitionViewAsyncIO.java:333) at org.apache.flink.runtime.io.disk.iomanager.AsynchronousFileIOChannel.handleProcessedBuffer(AsynchronousFileIOChannel.java:199) at org.apache.flink.runtime.io.disk.iomanager.BufferReadRequest.requestDone(AsynchronousFileIOChannel.java:435) at org.apache.flink.runtime.io.disk.iomanager.IOManagerAsync$ReaderThread.run(IOManagerAsync.java:408) Found 1 deadlock. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)