[ https://issues.apache.org/jira/browse/FLINK-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
zhijiang updated FLINK-8523: ---------------------------- Comment: was deleted (was: Hey [~pnowojski], [~NicoK] Glad to see we come back to this issue again. I think I understand your concerns completely, and actually there are two separate issues to be confirmed: 1. Whether to spill intermediate buffers before barrier alignment? If spilling the following buffers for blocked channel which already received barrier as before, we can free more floating buffer resources which may be used for other unblocked channels. From this point, it seems get benefit for barrier alignment. But the only concern is that it brings additional IO cost during spilling/replaying intermediate buffers. If the alignment is very fast which means only few intermediate buffers need to be spilled, and they may still exist in OS cache, so the cost can be ignored. But if the spilled data is very huge in IO sensitive environment, it will greatly hurt the performance in TPS. If not spilling as current codes, the only concern is that we can not make fully use of floating buffers before alignment, and it may delay the barrier alignment in some scenarios. So based on above analysis, no matter which way we take, it both has good points and bad points, and the behaviors may be different in various scenarios. In non-credit-based mode, we have to spill the data to avoid the deadlock, but now we have the chance to avoid the spill to try to make it better. And it seems better to not involve in any disk IO operation for stream job in runtime stack. From this point, I prefer to the way of not spilling. Maybe we need more tests, feedback or thinking for the final decision. 2. Avoid requesting floating buffers for blocked channels I think we can reach an agreement in this issue. No matter what is the conclusion of first issue. it is reasonable and can get determined benefit for doing this. And this JIRA is focusing on this issue. BTW, we ever made another improvement for speeding barrier alignment, that is reading unblocked channels in first priority instead of current random mode(FIFO based on network receiving). And it indeeds improve a log in barrier alignment aspect, because the task will not select unused intermediate buffers any more before alignment. But this selection may also change the original back pressure behavior and effect the performance in some scenarios. So it may be also a trade off.) > Stop assigning floating buffers for blocked input channels in exactly-once > mode > ------------------------------------------------------------------------------- > > Key: FLINK-8523 > URL: https://issues.apache.org/jira/browse/FLINK-8523 > Project: Flink > Issue Type: Sub-task > Components: Network > Affects Versions: 1.5.0, 1.6.0 > Reporter: zhijiang > Assignee: zhijiang > Priority: Major > Labels: pull-request-available > > In exactly-once mode, the input channel is set blocked state when reading > barrier from it. And the blocked state will be released after barrier > alignment or cancelled. > > In credit-based network flow control, we should avoid assigning floating > buffers for blocked input channels because the buffers after barrier will not > be processed by operator until alignment. > To do so, we can fully make use of floating buffers and speed up barrier > alignment in some extent. -- This message was sent by Atlassian JIRA (v7.6.3#76005)