[ https://issues.apache.org/jira/browse/FLINK-26568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504105#comment-17504105 ]
Matthias Pohl commented on FLINK-26568: --------------------------------------- Thanks to [~chesnay]: The stacktraces indicate that there is some race condition in reserving and freeing buffers (see [Stacktrace in logs|https://dev.azure.com/mapohl/flink/_build/results?buildId=845&view=logs&j=0a15d512-44ac-5ba5-97ab-13a5d066c22c&t=9a028d19-6c4b-5a4e-d378-03fca149d0b1&l=7700] where lock {{0x00000000afd2df30}} is held with other threads blocking on it): {code} "Flat Map -> Sink: Unnamed (9/12)#44786" #9135620 prio=5 os_prio=0 tid=0x00007fd2f41b5000 nid=0xcfd76 waiting for monitor entry [0x00007fd2b9042000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.flink.runtime.io.network.buffer.LocalBufferPool.setNumBuffers(LocalBufferPool.java:598) - waiting to lock <0x00000000afd2df30> (a java.util.ArrayDeque) at org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.redistributeBuffers(NetworkBufferPool.java:652) at org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.internalCreateBufferPool(NetworkBufferPool.java:509) - locked <0x0000000082e36d70> (a java.lang.Object) at org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:451) at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGateFactory.lambda$createBufferPoolFactory$3(SingleInputGateFactory.java:306) at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGateFactory$$Lambda$1153/781071075.get(Unknown Source) at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.setup(SingleInputGate.java:274) at org.apache.flink.runtime.taskmanager.InputGateWithMetrics.setup(InputGateWithMetrics.java:105) at org.apache.flink.runtime.taskmanager.Task.setupPartitionsAndGates(Task.java:965) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:652) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563) at java.lang.Thread.run(Thread.java:748) "Flat Map -> Sink: Unnamed (6/12)#44786" #9135619 prio=5 os_prio=0 tid=0x00007fd2f4195000 nid=0xcfd75 in Object.wait() [0x00007fd2b9143000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.internalRequestMemorySegments(NetworkBufferPool.java:243) - locked <0x0000000082c7dda8> (a java.util.ArrayDeque) at org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.requestPooledMemorySegmentsBlocking(NetworkBufferPool.java:179) at org.apache.flink.runtime.io.network.buffer.LocalBufferPool.reserveSegments(LocalBufferPool.java:247) - locked <0x00000000afd2df30> (a java.util.ArrayDeque) at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.setupChannels(SingleInputGate.java:517) at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.setup(SingleInputGate.java:277) at org.apache.flink.runtime.taskmanager.InputGateWithMetrics.setup(InputGateWithMetrics.java:105) at org.apache.flink.runtime.taskmanager.Task.setupPartitionsAndGates(Task.java:965) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:652) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563) at java.lang.Thread.run(Thread.java:748) {code} > BlockingShuffleITCase.testDeletePartitionFileOfBoundedBlockingShuffle timing > out on Azure > ----------------------------------------------------------------------------------------- > > Key: FLINK-26568 > URL: https://issues.apache.org/jira/browse/FLINK-26568 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination, Runtime / Task, Tests > Affects Versions: 1.15.0 > Reporter: Matthias Pohl > Priority: Critical > Labels: test-stability > Fix For: 1.15.0 > > > [This > build|https://dev.azure.com/mapohl/flink/_build/results?buildId=845&view=logs&j=0a15d512-44ac-5ba5-97ab-13a5d066c22c&t=9a028d19-6c4b-5a4e-d378-03fca149d0b1&l=12865] > timed out due the test > {{BlockingShuffleITCase.testDeletePartitionFileOfBoundedBlockingShuffle}} not > finishing. -- This message was sent by Atlassian Jira (v8.20.1#820001)