[ https://issues.apache.org/jira/browse/FLINK-9676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16533316#comment-16533316 ]
ASF GitHub Bot commented on FLINK-9676: --------------------------------------- Github user zhijiangW commented on a diff in the pull request: https://github.com/apache/flink/pull/6254#discussion_r200245464 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/consumer/RemoteInputChannel.java --- @@ -594,22 +626,22 @@ public String getMessage() { } /** - * Adds an exclusive buffer (back) into the queue and recycles one floating buffer if the + * Adds an exclusive buffer (back) into the queue and removes one floating buffer if the * number of available buffers in queue is more than the required amount. * * @param buffer The exclusive buffer to add * @param numRequiredBuffers The number of required buffers * - * @return How many buffers were added to the queue + * @return How many buffers were added to the queue (<tt>0</tt> or <tt>1</tt>) and the + * floating buffer which was removed and should be released (outside!) */ - int addExclusiveBuffer(Buffer buffer, int numRequiredBuffers) { + Tuple2<Integer, Optional<Buffer>> addExclusiveBuffer(Buffer buffer, int numRequiredBuffers) { exclusiveBuffers.add(buffer); if (getAvailableBufferSize() > numRequiredBuffers) { --- End diff -- For more strict, `getAvailableBufferSize() > numRequiredBuffers` should be `getAvailableBufferSize() == numRequiredBuffers + 1`? Otherwise, it may be understood the number of available buffers may exceed more than required buffers, but we only return one floating buffer each time. Actually it can only more than one in our design. Maybe it is out of this jira, and you can ignore it. > Deadlock during canceling task and recycling exclusive buffer > ------------------------------------------------------------- > > Key: FLINK-9676 > URL: https://issues.apache.org/jira/browse/FLINK-9676 > Project: Flink > Issue Type: Bug > Components: Network > Affects Versions: 1.5.0 > Reporter: zhijiang > Assignee: Nico Kruber > Priority: Critical > Labels: pull-request-available > Fix For: 1.6.0, 1.5.1 > > > It may cause deadlock between task canceler thread and task thread. > The detail is as follows: > {{Task canceler thread -> IC1#releaseAllResources -> recycle floating buffers > -> {color:#d04437}lock{color}(LocalBufferPool#availableMemorySegments) -> > IC2#notifyBufferAvailable}} > {color:#d04437}try to > lock{color}(IC2#bufferQueue) > {{Task thread -> IC2#recycle -> {color:#d04437}lock{color}(IC2#bufferQueue) > -> bufferQueue#addExclusiveBuffer}} -> {{floatingBuffer#recycleBuffer}} -> > {color:#d04437}try to lock{color}(LocalBufferPool#availableMemorySegments) > One solution is that {{listener#notifyBufferAvailable}} can be called outside > the {{synchronized(availableMemorySegments) in }}{{LocalBufferPool#recycle.}} > The existing RemoteInputChannelTest#testConcurrentOnSenderBacklogAndRecycle > can cover this case but the deadlock probability is very low, so this UT is > not stable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)