[ https://issues.apache.org/jira/browse/FLINK-12070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872870#comment-16872870 ]
Yingjie Cao commented on FLINK-12070: ------------------------------------- [~StephanEwen] Though, theoretically, two read buffers is enough. But there could be still problems. Your concern is reasonable, indeed, the old "SpillableSubpartition" incurs the same problem, but shows different behavior which is deadlock (I fix the deadlock problem when I test the old implementation), because the buffer request of the old "SpillableSubpartition" is blocking and when there is no enough read buffer, the Netty thread is blocked instead of throwing an exception. The reader can be scheduled not only when the buffer of same reader is sent out but also when the downstream adding credit or when buffers of other readers multiplex the same channel are sent out. So it is unsafe to make the assumption that when the reader is scheduled there must be a buffer for reading ahead. > Make blocking result partitions consumable multiple times > --------------------------------------------------------- > > Key: FLINK-12070 > URL: https://issues.apache.org/jira/browse/FLINK-12070 > Project: Flink > Issue Type: Improvement > Components: Runtime / Network > Affects Versions: 1.9.0 > Reporter: Till Rohrmann > Assignee: Stephan Ewen > Priority: Blocker > Labels: pull-request-available > Fix For: 1.9.0 > > Attachments: image-2019-04-18-17-38-24-949.png > > Time Spent: 20m > Remaining Estimate: 0h > > In order to avoid writing produced results multiple times for multiple > consumers and in order to speed up batch recoveries, we should make the > blocking result partitions to be consumable multiple times. At the moment a > blocking result partition will be released once the consumers has processed > all data. Instead the result partition should be released once the next > blocking result has been produced and all consumers of a blocking result > partition have terminated. Moreover, blocking results should not hold on slot > resources like network buffers or memory as it is currently the case with > {{SpillableSubpartitions}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)