[ https://issues.apache.org/jira/browse/FLINK-9676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16533080#comment-16533080 ]
ASF GitHub Bot commented on FLINK-9676: --------------------------------------- Github user NicoK commented on the issue: https://github.com/apache/flink/pull/6254 I updated the commit with a fixed unit test and added a big TODO on what else would need to be adapted in order to go with this solution (no updated commit message yet). Since this is even more intrusive and ugly than the changes before, I'll close this PR in favour of another solution developed in #6257. > Deadlock during canceling task and recycling exclusive buffer > ------------------------------------------------------------- > > Key: FLINK-9676 > URL: https://issues.apache.org/jira/browse/FLINK-9676 > Project: Flink > Issue Type: Bug > Components: Network > Affects Versions: 1.5.0 > Reporter: zhijiang > Assignee: Nico Kruber > Priority: Critical > Labels: pull-request-available > Fix For: 1.6.0, 1.5.1 > > > It may cause deadlock between task canceler thread and task thread. > The detail is as follows: > {{Task canceler thread -> IC1#releaseAllResources -> recycle floating buffers > -> {color:#d04437}lock{color}(LocalBufferPool#availableMemorySegments) -> > IC2#notifyBufferAvailable}} > {color:#d04437}try to > lock{color}(IC2#bufferQueue) > {{Task thread -> IC2#recycle -> {color:#d04437}lock{color}(IC2#bufferQueue) > -> bufferQueue#addExclusiveBuffer}} -> {{floatingBuffer#recycleBuffer}} -> > {color:#d04437}try to lock{color}(LocalBufferPool#availableMemorySegments) > One solution is that {{listener#notifyBufferAvailable}} can be called outside > the {{synchronized(availableMemorySegments) in }}{{LocalBufferPool#recycle.}} > The existing RemoteInputChannelTest#testConcurrentOnSenderBacklogAndRecycle > can cover this case but the deadlock probability is very low, so this UT is > not stable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)