[ 
https://issues.apache.org/jira/browse/FLINK-12070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16818365#comment-16818365
 ] 

Stephan Ewen commented on FLINK-12070:
--------------------------------------

I have a proof-of-concept for this feature in this branch: 
https://github.com/StephanEwen/incubator-flink/commits/mmappartition

It approaches the problem as follows:

*(1) Always write to a file channel directly*

This makes things a lot easier should not be really worse than the existing 
approach:
  - Large results hit disk anyways, because the amount of network buffer memory 
is rather small per result.
  - For very small results, the data should still be in the OS disk cache and 
be fast to read as well (see below).

*(2) Read through memory mapped file*

Rather than implementing complex readers with read requests and buffer pools, 
the code simply maps the result file into memory. All data that is still in the 
OS disk cache is immediately accessible at normal memory speed. Multiple reads 
to the result will share the same memory, saving on I/O and copy operations.
The OS will automatically evict the data from memory when there is not enough 
memory available.

*(3) Future proof for caching intermediate results*

For features like caching intermediate results, this has some nice implications:
  - Caching in memory and reading at memory speed
  - Automatic eviction to disk
  - No network buffer resource management, which may impact / starve other 
tasks.


NOTE: The PoC is still missing the following:
  - Unit tests
  - Benchmarks to validate that this does not have a noticeable negative 
performance impact
  - Not releasing the partition after first consumption


> Make blocking result partitions consumable multiple times
> ---------------------------------------------------------
>
>                 Key: FLINK-12070
>                 URL: https://issues.apache.org/jira/browse/FLINK-12070
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Network
>            Reporter: Till Rohrmann
>            Assignee: BoWang
>            Priority: Major
>
> In order to avoid writing produced results multiple times for multiple 
> consumers and in order to speed up batch recoveries, we should make the 
> blocking result partitions to be consumable multiple times. At the moment a 
> blocking result partition will be released once the consumers has processed 
> all data. Instead the result partition should be released once the next 
> blocking result has been produced and all consumers of a blocking result 
> partition have terminated. Moreover, blocking results should not hold on slot 
> resources like network buffers or memory as it is currently the case with 
> {{SpillableSubpartitions}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to