[ https://issues.apache.org/jira/browse/FLINK-16641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Piotr Nowojski updated FLINK-16641: ----------------------------------- Fix Version/s: (was: 1.11.0) 1.12.0 > Announce sender's backlog to solve the deadlock issue without exclusive > buffers > ------------------------------------------------------------------------------- > > Key: FLINK-16641 > URL: https://issues.apache.org/jira/browse/FLINK-16641 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Network > Reporter: Zhijiang > Assignee: Yingjie Cao > Priority: Major > Labels: pull-request-available > Fix For: 1.12.0 > > > This is the second ingredient besides FLINK-16404 to solve the deadlock > problem without exclusive buffers. > The scenario is as follows: > * The data in subpartition with positive backlog can be sent without doubt > because the exclusive credits would be feedback finally. > * Without exclusive buffers, the receiver would not request floating buffers > for 0 backlog. But when the new backlog is added into such subpartition, it > has no way to notify the receiver side without positive credits ATM. > * So it would result in waiting for each other between receiver and sender > sides to cause deadlock. The sender waits for credit to notify backlog and > the receiver waits for backlog to request floating credits. > To solve the above problem, the sender needs a separate message to announce > backlog sometimes besides existing `BufferResponse`. Then the receiver can > get this info to request floating buffers to feedback. > The side effect brought is to increase network transport delay and throughput > regression. We can measure how much it effects in existing micro-benchmark. > It might probably bear this effect to get a benefit of fast checkpoint > without exclusive buffers. We can give the proper explanations in respective > configuration options to let users make the final decision in practice. -- This message was sent by Atlassian Jira (v8.3.4#803005)