[ https://issues.apache.org/jira/browse/FLINK-12070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16818365#comment-16818365 ]
Stephan Ewen commented on FLINK-12070: -------------------------------------- I have a proof-of-concept for this feature in this branch: https://github.com/StephanEwen/incubator-flink/commits/mmappartition It approaches the problem as follows: *(1) Always write to a file channel directly* This makes things a lot easier should not be really worse than the existing approach: - Large results hit disk anyways, because the amount of network buffer memory is rather small per result. - For very small results, the data should still be in the OS disk cache and be fast to read as well (see below). *(2) Read through memory mapped file* Rather than implementing complex readers with read requests and buffer pools, the code simply maps the result file into memory. All data that is still in the OS disk cache is immediately accessible at normal memory speed. Multiple reads to the result will share the same memory, saving on I/O and copy operations. The OS will automatically evict the data from memory when there is not enough memory available. *(3) Future proof for caching intermediate results* For features like caching intermediate results, this has some nice implications: - Caching in memory and reading at memory speed - Automatic eviction to disk - No network buffer resource management, which may impact / starve other tasks. NOTE: The PoC is still missing the following: - Unit tests - Benchmarks to validate that this does not have a noticeable negative performance impact - Not releasing the partition after first consumption > Make blocking result partitions consumable multiple times > --------------------------------------------------------- > > Key: FLINK-12070 > URL: https://issues.apache.org/jira/browse/FLINK-12070 > Project: Flink > Issue Type: Improvement > Components: Runtime / Network > Reporter: Till Rohrmann > Assignee: BoWang > Priority: Major > > In order to avoid writing produced results multiple times for multiple > consumers and in order to speed up batch recoveries, we should make the > blocking result partitions to be consumable multiple times. At the moment a > blocking result partition will be released once the consumers has processed > all data. Instead the result partition should be released once the next > blocking result has been produced and all consumers of a blocking result > partition have terminated. Moreover, blocking results should not hold on slot > resources like network buffers or memory as it is currently the case with > {{SpillableSubpartitions}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)