[ https://issues.apache.org/jira/browse/FLINK-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711090#comment-14711090 ]
Stephan Ewen commented on FLINK-1730: ------------------------------------- There is a lot of support in Flink's runtime to actually store an intermediate data set persistent, starting in memory, going to disk if needed. It has not been connected to the APIs yet, because the committers that worked prioritized some other aspects (high availability) in the meantime. This feature should really work on the network stack level (caching buffers is most efficient) and the execution graph level (can recompute lost results upon failures, properly represent dependencies in the DAG). Concerning other caching levels: I am not sure any other level is really needed. Any disk level caching should start in memory as long as memory is available. Any memory-level caching that silently drops the cache and requires lineage-based recomputation is pretty unpredictable, so memory-destaging-to-disk is the most sensible version. > Add a FlinkTools.persist style method to the Data Set. > ------------------------------------------------------ > > Key: FLINK-1730 > URL: https://issues.apache.org/jira/browse/FLINK-1730 > Project: Flink > Issue Type: New Feature > Reporter: Stephan Ewen > Priority: Minor > > I think this is an operation that will be needed more prominently. Defining a > point where one long logical program is broken into different executions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)