[ https://issues.apache.org/jira/browse/FLINK-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711113#comment-14711113 ]
Sachin Goel commented on FLINK-1730: ------------------------------------ Yes. Going through the entire pact task logic, I observed all of that. I was almost surprised how well it could support this functionality. One of the ideas I have is to implement two specific gates: One for input, which resides directly on memory manager, and an output gate, whose output is written to the memory, and not transferred over network. This way, the Pack task can just create one of these two gates and add to the existing gates, depending on whether the results are available in the cache or not. After that, it's just a matter of initializing the {{NoOpDriver}}. Further, although I'm not sure about it, the memory manager itself can spill data to disk if needed, right? That way, it's not required at all to implement something in-memory-cum-disk. It's already there. The relevant storage on the memory manager will have locks based on task name and indexes, so that the cache is not cleared out until the accessing tasks have finished reading it. And we could perhaps follow a LRU scheme for clearing out the cache storage. > Add a FlinkTools.persist style method to the Data Set. > ------------------------------------------------------ > > Key: FLINK-1730 > URL: https://issues.apache.org/jira/browse/FLINK-1730 > Project: Flink > Issue Type: New Feature > Reporter: Stephan Ewen > Priority: Minor > > I think this is an operation that will be needed more prominently. Defining a > point where one long logical program is broken into different executions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)