Zhijiang created FLINK-16641:
--------------------------------

             Summary: Announce sender's backlog to solve the deadlock issue 
without exclusive buffers
                 Key: FLINK-16641
                 URL: https://issues.apache.org/jira/browse/FLINK-16641
             Project: Flink
          Issue Type: Sub-task
          Components: Runtime / Network
            Reporter: Zhijiang
             Fix For: 1.11.0


This is the second ingredient besides FLINK-16404 to solve the deadlock problem 
without exclusive buffers.

The scenario is as follows:
 * The data in subpartition with positive backlog can be sent without doubt 
because the exclusive credits would be feedback finally.
 * Without exclusive buffers, the receiver would not request floating buffers 
for 0 backlog. But when the new backlog is added into such subpartition, it has 
no way to notify the receiver side without positive credits ATM.
 * So it would result in waiting for each other between receiver and sender 
sides to cause deadlock. The sender waits for credit to notify backlog and the 
receiver waits for backlog to request floating credits.

To solve the above problem, the sender needs a separate message to announce 
backlog sometimes besides existing `BufferResponse`. Then the receiver can get 
this info to request floating buffers to feedback.

The side effect brought is to increase network transport delay and throughput 
regression. We can measure how much it effects in existing micro-benchmark. It 
might probably bear this effect to get a benefit of fast checkpoint without 
exclusive buffers. We can give the proper explanations in respective 
configuration options to let users make the final decision in practice.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to