[ 
https://issues.apache.org/jira/browse/FLINK-12070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855260#comment-16855260
 ] 

Yingjie Cao commented on FLINK-12070:
-------------------------------------

[~StephanEwen] There is no swap partition in my testing OS. The data of mmaped 
region was flushed to disk at the very begining, that is, no lazy swaping. 
However, the disk writing speed is far lower than the memory writing speeding, 
so memory will be exhausted sooner or latter so as long the data volume is 
large enough. 

As for the reason why the machine froze up, I guess it is because that flushing 
mmaped region to disk also need memory while no enough pages left.

I'd like to perform some more tests following your suggestion and will post the 
results out latter. BTW, we made Blink spill to file directly and read from 
file directly and there is no significant performance regression.

[~srichter] The max heap memory in my test setting is fairly large, but most of 
it is never used (not allocated), so it should not have much impact. Maybe 
reducing max heap memory can improve the performance, but the problem will not 
be solved.

> Make blocking result partitions consumable multiple times
> ---------------------------------------------------------
>
>                 Key: FLINK-12070
>                 URL: https://issues.apache.org/jira/browse/FLINK-12070
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Network
>    Affects Versions: 1.9.0
>            Reporter: Till Rohrmann
>            Assignee: Stephan Ewen
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 1.9.0
>
>         Attachments: image-2019-04-18-17-38-24-949.png
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> In order to avoid writing produced results multiple times for multiple 
> consumers and in order to speed up batch recoveries, we should make the 
> blocking result partitions to be consumable multiple times. At the moment a 
> blocking result partition will be released once the consumers has processed 
> all data. Instead the result partition should be released once the next 
> blocking result has been produced and all consumers of a blocking result 
> partition have terminated. Moreover, blocking results should not hold on slot 
> resources like network buffers or memory as it is currently the case with 
> {{SpillableSubpartitions}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to