Zhijiang created FLINK-17823:
--------------------------------

             Summary: Resolve the race condition while releasing 
RemoteInputChannel
                 Key: FLINK-17823
                 URL: https://issues.apache.org/jira/browse/FLINK-17823
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Network
    Affects Versions: 1.11.0
            Reporter: Zhijiang
            Assignee: Zhijiang
             Fix For: 1.11.0


RemoteInputChannel#releaseAllResources might be called by canceler thread. 
Meanwhile, the task thread can also call RemoteInputChannel#getNextBuffer. 
There probably cause two potential problems:
 * Task thread might get null buffer after canceler thread already released all 
the buffers, then it might cause misleading NPE in getNextBuffer.
 * Task thread and canceler thread might pull the same buffer concurrently, 
which causes unexpected exception when the same buffer is recycled twice.

The solution is to properly synchronize the buffer queue in release method to 
avoid the same buffer pulled by both canceler thread and task thread. And in 
getNextBuffer method, we add some explicit checks to avoid misleading NPE and 
hint some valid exceptions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to